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Two Diroensional LinkaaeJStudyJCechDiques^ 



Technical Field 



3 Versions of the present invention are in the field of molecular biology, some versions are SDecificallv in 

4 the area of finding the chromosomal location of genes that cause genetic characteristics such as 

5 human disease. 

6 



8 

9 Introduction 

10 Conventional linkaae studv techniaues have limited oower to localize trait causina aenes ( trait causina 

1 1 polymorphisms ) of modest effect, such as many human disease polymorphisms The two-dimensional 

12 linkage study techniques of this application are powerful new techniques for localizing genes 

13 (polymorphisms) especially of modest effect. 

14 Chromosomes, heredity, genes, markers and alleles 

1 5 Chromosomes are large molecules that carry the information for the inhentance of physical (genetic) 

1 6 characteristics or traits. In human beings for example, parents pass a copy of half of their chromosomes 

17 to their offspnng during reproduction. By doing this, each parent passes some of his or her physical 

1 8 characteristics to his or her offspnng. Any chromosome of a living creature is made of a large stnng-like 
J. 9 molecule of DNA Chromosomes are essentiaJJv verv Jong stnngs of DMA. Genes are small pieces of a 

20 chromosome that cause or determine inherited genetic characteristics. (In this application, the term 

21 gene means a polymorphism that determines a genetic characteristic; the term does not mean an entire 

22 gene structure with a promoter region, introns. etc..) Markers are any segment of DNA on a 

23 chromosome which can be identified and whose chromosomal location is known (at least to some 

24 extent). Markers are like milestones along the very long string-like molecule of DNA which makes up a 

25 chromosome. Both a gene and a marker can come in different forms on different chromosomes. These 

26 different forms are known as different alleles and when a gene or marker comes in different forms it is 

27 said to be "polymorphic". For example, a bi-allelic marker comes in two (bi) different forms. 

28 Linkage 

29 If a gene allele and a marker allele occur as part of the genetic makeup of individuals more frequently 

30 than would be expected on the basis of chance, then it is possible to infer that the gene and the marker 

3 1 are linked. If a gene allele and a marker allele are inherited together more frequently than would be 

32 expected if the gene and the marker were on different chromosomes, then it is possible to infer that the 

33 gene and the marJier are linked. Linkage of a gene and a marker usually occurs because the gene and 

34 the marker are close together on a chromosome There are different degrees of linkage Establishing 

35 linkage, especially strong linkage, between a gene and a marker can be very valuable. This is 

36 especially true if the precise location and other characteristics of the gene are not known. By 

37 establishing linkage, especially strong linkage, between a known marker and an unknown gene it is 

38 possible to locate the gene near to the chromosomal location of the known marker. This can be very 
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1 valuable if the gene is an important gene, such as a disease causing gene, and can help cure the 

2 disease. 

3 Linkage Studies 

4 Linkage studies are a method of establishing linkage between a marker and a gene or genes. Linkage 

5 studies are used to statistically correlate the occurrence of a genetic characteristic such as a dtsease 

6 (caused by a gene or genes) with a marker on a chromosome. One way this is done is by statistically 

7 correlating a specific allele of a marker with a genetic characteristic for a set of individuals by showing 

8 that individuals with the charactenstic inherit the marker allele more often than individuals without the 

9 characteristic. The set of individuals is usually referred to as a sample of individuals. An example of a 

10 sample of individuals are people with a disease and similar people (matched controls) without the 

1 1 disease. Another example of a sample of individuals is a group of people, some of whom have the 

12 same disease; each of the people in the group being related to one or more of the other people in the 

13 group (i.e. families, sibships, pedigrees). The presence or absence of a marker allele in the 

14 chromosomal DNA of each individual is usually determined by genotype data at the marker for each 

15 individual. 

16 There are different types of linkage study techniques, using different types of samples and different 

] 7 statistical measures of the correlation of a marker and a genetic characteristic. One example of a type 

18 of linkage study technique is the affected sib pair (ASP) test. Another example is the transmission 

19 disequilibnum test (TDT). which is an association based linkage test. This is a dynamic, changing area 

20 within the field of human genetics. 

21 Linkage Studies and the "Scanning" of Chromosomal Regions 

22 There are significant advantages in using several markers simultaneously to perform a linkage study 

23 with a genetic characteristic and a sample of individuals, especially when the relative positions of the 

24 markers on a chromosome are known. Such a linkage study allows searching for statistical evidence of 

25 linkage between markers in one or more regions of a chromosome or chromosomes and the gene or 

26 genes that determine the genetic characteristic. The results of the study for each marker can then be 

27 compared with the results for other markers, knowing the relative chromosomal positions of all the 

28 markers in the study. In this way. regions of a chromosome or even whole chromosomes can be 

29 "scanned" for evidence of linkage to a gene or genes causing a genetic characteristic. The relative 

30 positions of markers on chromosomes of a species of creatures is given by various kinds of 

31 chromosomal maps for the species. (There are several different kinds of marker maps, i.e. physical 

32 maps, genetic maps, radiation hybrid maps, etc.) 

33 Sets of Markers for Linkage Studies and "Scanning" Chromosomes 

34 An appropnate set of markers from a region of a chromosome can be chosen so that the region can be 

35 "scanned" for evidence of linkage of markers in the region to a gene or genes that cause a genetic 

36 characteristic. As explained above, this scanning is done by using the markers in linkage studies 

37 Strong positive evidence for linkage of the markers (from the scanned chromosomal region) to a gene 

38 or genes responsible for a characteristic or trait is strong evidence that a trait-causing gene or genes is 

39 located within the chromosomal region. 
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1 Convention al Techniques for Choosing Sets of Markers to Scan Chromosomes with Linkage Studies 

2 Conventionaf techniques choose sets of markers to scan a chromosomal region by choosing markers 

3 according to each marker's chromosomal location within the region. In a set of microsatellite markers 

4 described in 1994 for use in linkage studies, the markers were approximately evenly spaced, with 

5 average spacing between markers being 13 centiMorgans. The markers were distributed approximately 

6 evenly across the entire human genome (all human chromosomes) and were also selected because 

7 genotype data at the markers for individuals could be obtained by a semi-automated method.' A recent 

8 (1998) linkage study of the disease schizophrenia used a set of 310 microsatellite markers distnbuted 

9 approximately evenly across the entire human genome with average spacing of 1 1 centiMorgans 

iO between markers ^ In a recent (1 998) simulation of linkage studies to defend the practice of two-stage 

] 1 genome scanning, markers were spaced evenly every 1 0 cM(centimorgans) in an initial, sparser, first 

12 stage scan and evenly every 1 cM in a followup, denser, second stage scan.^ Following up positive 

13 linkage study results from chromosomal regions in a sparse, first stage scan with a second, denser 

14 scan that focuses on studying the regions with positive first-stage results is a common technique. In 

15 these conventional studies, as is common, markers were chosen to be about evenly spaced across the 

16 chromosomal regions studied. In this manner, as is conventional, a one dimensional structure such as 

17 an entire genome, a chromosome or a region of a chromosome is "covered" by markers in order to 

18 scan the entire genome, chromosome or chromosomal region with a linkage study. (These conventional 

19 techniques'- ^' ^ are not admitted to be prior art by their mention in this background.) ( There is a 

20 possibly confusing, double meaning, of the term "marker map" It should be noted that a set of markers 

21 distributed along a chromosomal region, chromosome, or genome for linkage studies is also sometimes 

22 referred to as a "marker map" for use in chromosomal scanning by linkage studies. In addition, 

23 chromosomal or genetic maps of markers are also referred to as "marker maps".) 

24 Conventional Techniques for Choosing Sets of Markers to Scan Chromosomal Regions are 

25 Essentially One Dimensiona l 

26 Because DNA is a stringlike molecule, a chromosomal region(s), chromosome(s) and genome are 

27 essentially one dimensional in terms of the chromosomal location of markers and genes. As has been 

28 stated, conventional linkage study techniques scan a chromosomal region(s). chromosome(s) or 

29 genome by using markers distributed approximately evenly along the length of the chromosomal 

30 region(s), chromosome(s) or genome respectively. These conventional techniques focus primarily on 

31 the chromosomal location of markers used in a scan. These conventional techniques have an 

32 essentially one dimensional perspective. 

33 Population Frequency of Marker Alleles and Gene Alleles 

34 As described, chromosomal location of each marker is an important and unique characteristic of each 

35 marker and marker allele. Another characteristic of each polymorphic marker and each of the marker's 



' Reed, et.al.: Chromosome-specific microsateliite sets for flourescence-based, semi-automated genome 
mapping. Nature Genetics, July 1994; vol. 7: pp. 390-395. 

^ Levinson. etal.: Genome Scan of Schizophrenia. Am J Psychiatry, June 1998; vol. 155: pp. 741-750. 
^ Kruglyak. et. al.: Linkage Thresholds for Two-stace Genome Scans. Am J Hum Genet 1998 vol 6^ 
pp. 994-996. 
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1 alleles is the population frequency of each marker allele. A population is a group (usually a large group) 

2 of individuals. A population frequency of a particular marker allele is the proportion of individual 

3 chromosomes in a population in which the particular marker occurs as the particular marker allele. For 

4 any bi-allelic marker, knowing the least common allele frequency of the marker establishes both of the 

5 allele frequencies of the marker. This is because the two allele frequencies of a bi-allelic marker sum to 

6 1 . Each gene allele also has a population allele frequency or allele frequency for short. Thus, each 

7 gene allele has a particular chromosomal location and allele fr'equency (for a particular population). In 

8 the case of an unknown gene, the gene's chromosomal location and allele frequencies are not 

9 specifically known. 

10 Marker Allele Population Frequency in Conventional Linkage Study Scans 

11 It is important to note that little attention was paid to the population allele frequencies of the markers 

12 used in the conventional linkage scans cited above. In the two studies cited above under conventional 

13 scanning techniques'' ^, marker allele frequency is referred to only peripherally as average marker 

14 heterozygosity, which is related to average marker allele frequency and the number of alleles (2, 3, 4, 5, 

15 etc.) at each marker. In the simulated scan cited above^, the markers are stipulated to have four alleles 

1 6 that all have exactly the same allele frequency of 0.25 (heterozygosity 0.75). It is important to note 

17 tfiat while the chromosomal location of the markers in all these conventional scans was 

1 8 systematically varied over the entire genome (all the human chromosomes), nothing was said 

19 about systematically varying the allele frequencies of the markers in any of the scans. This is 

20 typical of conventional linkage study scans of genomes, chromosomes and chromosomal regions. 

21 A Conventional View Of Bi-allelic Markers And Linkage Studies 

22 We cite here a welt known reference that discusses the conventional view of bi-allelic marker 

23 usefulness in linkage scans of chromosomes. In 1997 Kruglyak carried out computer simulations of the 

24 "information content" of markers that are part of various different marker maps."* For bi-allelic markers 

25 his results showed that the optimum allele frequencies for bi-allelic markers used in linkage studies is 

26 0.5/0.5 in order to achieve the greatest information content. However, allele frequency patterns other 

27 than the optimum 0.5/0.5 for bi-allelic markers gave acceptable levels of information content depending 

28 on the density of the marker map (or set of markers) chosen for the linkage study. 

29 There are some important observations regarding this reference."* First, there is no advantage noted 

30 in this reference for choosing bi-allelic markers so that the set of chosen markers (or marker 

3 1 map) used for linkage studies is such that the markers systematically vary in allele frequency. 

32 Thus, just as in the recent conventional linkage study scans cited above, there is no definite thought to 

33 using markers of systematically varying allele frequencies. The greatest information content is given by 

34 bi-allelic markers with allele frequencies close to the optimum of 0.5/0.5. Given the density of 

35 reasonably polymorphic SNPs predicted in this reference, at least one every 1 kb or 1 ,000 per cM, it is 

36 probable that even for quite dense maps, there will be so many acceptable SNPs available, that all of 

37 the SNPs in an appropnate marker map could have the optimum allele frequencies of approximately 



Kruglyak: The use of a genetic map of biallelic markers in linkage studies. Nature Genetics, 
September 1997, voL17, pp. 21-24. 
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1 0.5/0.5. Secondly, bi-allelic markers with lower least common allele frequencies, less than 0.3(0.7/0.3) 

2 or 0.2(0.8/0.2), are viewed unfavorably for linkage studies in this reference. Thirdly, the early version of 

3 the criterion of "information content" of markers used in this reference was based on sib pair analysis 

4 and the later, current version of the criterion, does not depend on any particular test for linkage.^- ^ 

5 Thus, the criterion of infonnation content in this reference, has never specifically employed the 

6 TDT (transmission disequilibrium test) or any association based test, whereas the two- 

7 dimensional linkage study techniques of this application are based on a completely different 

8 perspective of using association based tests. (This reference^ is not admitted to be prior art with 

9 respect to the present invention by it's mention in this background.) 

10 Increased Power of the TDT (transmission disequilibrium test) 

1 1 Characteristics of a new type of linkage test, the TDT (transmission disequilibnum test), were described 

12 in 1993. The inventor, R.E.McGinnis, was one of the authors of this reference.'' In 1996, Risch and 

13 Merikangas argued that conventional linkage analysis has limited power to detect genes of modest 

1 4 effect. And Risch and Menkangas attempted to illustrate the increased power of association based 

15 linkage tests such as the TDT over other types of conventional linkage tests.® However, Risch and 

16 Merikangas' analysis was criticized by Muller-Myhsok and Abel as being based on the optimal 

17 assumption that the analyzed allele was the disease allele itself. Muller-Myhsok and Abel concluded 

1 8 that researchers should be aware that the power of association studies such as the TDT can be greatly 

19 diminished in more common, less optimal situations.^ In their response to Muller-Myshok and Abels' 

20 letter, Risch and Merikangas essentially agreed with the logic of Muller-Myshok and Abels' criticism. 

21 Risch and Merikangas stated that to a large extent, the expectation with respect to linkage 

22 disequilibrium across the genome is uncharted territory.^" (None of the references in this paragraph 

23 is admitted to being prior art with respect to the present invention by their mention in this 

24 background.) 

25 More Detailed Studies of the Power of the TDT 

26 The inventor, R.E.McGinnis, has done extensive investigations on the power of the TDT. His 

27 observations and calculations of the increased power of the TDT in many situations have been 



^ Kruglyak, et. al.; Complete Multipoint Sib-Pair Analysis of Qualitative and Quantitative Traits. Am J 
Hum Genet. 1995, vol. 57: pp. 439-454. 

^ Kruglyak, et. al.: Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach. 
Am J Hum Genet , 1996, vol. 58, pp. 1347- 1363. 

^ Spielman, R.S., McGinnis, R.E., Ewens, W.J.; Transmission Test for Linkage Disequilibrium: TTie 
Insulin Gene Region and Insulin-dependent Diabetes Mellitus(IDDM). Am J Hum Genet, 1993. vol. 52. 
|)p. 506-516. 

Risch, N. and Merikangas. K.: The Future of Genetic Studies of Complex Human Diseases. Science. 
13 September 1996, vol. 273, pp. 1516-1517. 

Muller-Myshok, B. and Abel, L.: Technical Comments: The Future of Complex Diseases. Science, 28 
February 1997, vol. 275, pp. 1328-1329. 

"* Risch, N. and Merikangas. K.: Technical Comments: The Future of Complex Diseases. Science. 28 
Febmary 1997, vol. 275, p. 1330. 
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1 published/" In this paper a general framework for determining the power of the TDT in many different 

2 situations is presented. The analysis of Risch and Merikangas ^ and others is shown by the inventor to 

3 be a special case of his general framework. His observations and calculations published in this paper 

4 have shown that the TDT has increased power in more common, less optimal situations as well as the 

5 less common, optimal situation cited by Muller-Myshok and Abel As opposed to the observation of 

6 Muller-Myhsok and Abel, the inventor's calculations indicate that association tests such as the TDT 

7 have increased power in typical situations even when the ratio m/p departs significantly from unity and, 

8 or the linkage disequilibrium between the analyzed (marker) allele and disease polymorphism is only 

9 half its maximum possible value. The inventor arrived at these conclusions independently and did not 

1 0 derive them from others. 

11 A Major Conclusion Drawn by the Inventor about the TDT and Linkage StudiBs: Using Bi-allelic 

12 Markers of Systematically Varying Allele Freguencies Increases the Power of Linkage Studies 

13 Using the TDT 

14 The inventor's calculations and observations about the increased power of the TDT in more common. 

15 less optimal situations led him to the conclusion that the power of linkage studies using the TDT is 

16 greatly increased under some conditions. Under some conditions, the power of the TDT in a linkage 

17 study using bi-allelic markers is greatly increased when each of one or more of the bi-allelic markers 

18 used in the study fulfill two criteria: (1 ) the allele frequencies of each of the one or more of the bi-atlelic 

19 markers are similar (but not necessarily the same, or even approximately the same) as the allele 

20 frequencies of an unknown bi-allelic gene causing a disease under study; and (2) each of the one or 

21 more bi-allelic markers is in some degree of linkage disequilibrium with the gene. Thus for a typical 

22 linkage study using bi-alleiic markers and the TDT, to increase the likelihood of conditions 

23 occurring that increase the power of the TDT in the linkage study, the bi-allelic markers used in 

24 the study are chosen so that the least common allele frequencies of the markers vary 

25 systematically over a range or subrange of least common allele frequency. This major conclusion 

26 of the inventor's research is quoted directly from his unpublished manuscript that was included with 

27 previously filed U.S. Provisional Patent Applications: "This example is typical and highlights perhaps the 

28 most important finding of this paper; namely the importance of using bi-allelic markers with 

29 heterozygosity similar to that of a bi-allelic disease gene. Indeed, since a majority of susceptibility loci 

30 may be bi-allelic, the judicious use of bi-allelic markers of both high, medium and low heterozygosity 

31 may be crucial in order to detect and replicate linkages to loci conferring modest disease risk." (page 

32 25) (In this context the phrase "bi-allelic markers with heterozygosity similar to that of a bi-allelic 

33 disease gene" is essentially equivalent to "bi-allelic markers with individual allele frequencies similar to 

34 those of a bi-allelic disease gene" and "bi-allelic markers of both high, medium and low heterozygosity 

35 " is essentially equivalent to the phrase "bi-allelic markers whose least common individual allele 

36 frequencies are high, medium and low" ) 



" McGinnis. R.E.: Hidden Linkage: Comparison of the affected sib pair (ASP) test and transmission 
disequilibrium test (TDT). Aimals of Human Genetics. 1998. vol. 62, pp. 159-179. 
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1 Systematically Varying Both Marker Chromosomal Location and Marker Allele Frequency of Markers in 

2 Linkage Studies 

3 The inventor's calculations and observations haye demonstrated the increased power of the TDT in 

4 more common, less optimal situations when a bi-allelic marker and bi-atlelic gene haye (1 ) similar but 

5 not identical allele frequencies and (2) the marker and gene are in some degree of linkage 

6 disequilibrium. Thus, for a typical linkage study using bi- allelic markers and the TDT, to increase the 
1 likelihood of both criteria (1) and (2) occurring for one or more markers, so as to increase the 

8 power of the TDT in the linkage study, the bi-allelic markers used in the study are chosen so 

9 that the least common allele frequencies of the markers vary systematically over a range or 

1 0 subrange of least common allele frequency AND the chromosomal location of the markers vary 

1 1 systematically over one or more chromosomes or chromosomal regions. And the bi-allelic 

12 markers are chosen so that the markers' chromosomal locations and least common allele 

1 3 frequencies vary systematically in an essentially independent manner. 

14 Two-dimensional Linkage Study Techniques 

15 As has been stated, conventional linkage study scanning techniques use markers that are distributed 

16 approximately evenly in the dimension of chromosomal location. These conventional, one dimensional, 

17 scanning techniques focus primarily on the chromosomal location of markers used in a scan and give 

1 8 little attention to the dimension of allele frequency. ^- ^ 

19 One of the main implications of the inventor's work is to use a set of bi-allelic markers for a typical 

20 linkage study using the TDT (or other association-based linkage test) wherein the chromosomal 

21 locations and least common allele frequencies of the markers in the set systematically vary in an 

22 essentially independent manner over the dimensions of chromosomal location and least common allele 

23 frequency respectively. This is equivalent to using a set of bi-allelic markers for a linkage study scan 

24 wherein the set of markers systematically scan or "cover" a two-dimensional region having dimensions 

25 of chromosomal location and least common allele frequency. (Such a two-dimensional region can be 

26 thought of as an area in an x-y plot or a group of squares on a chessboard.) 

27 In addition, the inventor's calculations and observations indicate that bi-allelic markers having least 

28 common allele frequencies less than 0.3, 0.2 or even less than 0.1 have an important place in linkage 

29 studies using association based linkage tests. This is markedly different than Kruglyak's information 

30 content evaluation of bi-allelic markers for use in linkage studies, in which bi-allelic markers with least 

31 common allele frequencies less than 0.3 or 0.2 are viewed unfavorably.** 

32 In addition, the two-dimensional linkage study techniques do not necessarily favor using markers in a 

33 scan that are about evenly spaced along a chromosome as in the conventional techniques. This is 

34 because conventional techniques suffer from a kind of one dimensional view or lack of depth 

35 perception, in the conventional techniques, a marker can look very close to a gene's location in 

36 terms of chromosomal location, but the marker can be very far from the gene's location in the 

37 new two dimensional view used by versions of the invention. 

38 It is as if the conventional 1D techniques look at a chessboard from on edge. Markers and a 

39 gene which are on different squares of the board, but in the same column of squares, look very 
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1 close to each other when the board is looked at from on edge. But when the board is looked at 

2 from the top in 2D, two dimensions, markers which looked very close to each other and the 

3 gene before (when looking from on edge) can be seen to be very far from the gene. 

4 Further Implications of the Two-dimensional Linkage Study Perspective 

5 These two-dimensional techniques work when multiple genes cause a genetic characteristic and are 

6 effective in searching for these genes. A two-dimensional bi-allelic marker "covering" or scanning 

7 approach also increases the power of linkage studies using other association based linkage tests such 

8 as the AFBACmethod, the haplotype relative risk (HRR) method and comparison of marker allele 

9 frequencies in disease cases and unrelated controls^^. These references'^ are not admitted to being 
10 prior art with respect to the present invention by their mention in this background.) 

n Patents That May Be Helpful In Starting A Search Of The Background 

12 Some patents that are in the same general areas as versions of the invention are cited here: US Patent 

13 Number 5,667,976 Solid supports for nucleic acid hybndization assays. Published International 

14 Application WO 98/20165 Biallelic Markers. Published International Application WO 98/07887 Methods 

15 for treating bipolar mood disorder associated with markers on chromosome 18 p. US Patent Number 

16 5,552,270 Methods of DNA sequencing by hybridization based on optimizing concentration of matrix- 

17 bound oligonucleotide and device for carrying out same. No patent in this paragraph is admitted to 

18 being prior art with respect to the present invention by it's mention in this background. 



Falk CT and Rubenstein P: Haplotype relative risks: an easy reliable way to construct a proper 
control sample for risk calculations. Annals of Human Genetics, 1987, vol. 51 , pp. 227-233. 

Bell Gl, Horita S and Karam JH: A polymorphic locus near the human insulin gene is associated with 
insulin-dependent diabetes mellitus. Diabetes, 1984, vol 33, pp. 176-183. 
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Two-Dimensional Linkage Study Techniques 



2 



Brief Description of Some Concepts Used By Versions of the invention 



3 Versions of the present invention make use of the novel concept of systematically covenng a region on 

4 a two-dimenstonal map similar to an x-y graph with bi-alleiic markers. The x axis on this map is the 

5 chromosomal location dimension and the y axis of the map is the least common allele frequency 

6 dimension. This two-dimensional map is called a CL-F map in this application. (CL stands for 

7 chromosomal location and F stands for least common allele frequency.) Each point on a CL-F map has 

8 two coordinates: a chromosomal location coordinate and a frequency coordinate. A point on a CL-F 

9 map is called a CL-F point. 

10 Any one bi-allelic polymorphism (marker or gene) is viewed as being located at a particular CL-F point 

11 on a CL-F map. The chromosomal location of the polymorphism is the chromosomal location coordinate 

12 of the point. And the least common allele frequency of the polymorphism is the frequency coordinate of 

13 the point. The chromosomal location coordinate of a CL-F point is given in units of centiMorgans or 

14 base pairs or an equivalent thereof and the least common allele frequency coordinate of a CL-F point is 

15 given in units between 0 and 0.5 inclusive, such as 0.2. 

16 Distances between any two CL-F points on a CL-F map are given in terms of two numbers: 

17 chromosomal location distance and frequency distance. The first number is the distance in the 

18 horizontal, chromosomal location direction. This first number is the chromosomal location distance. The 

19 second number is the distance in the vertical, frequency direction. This second number is the frequency 

20 distance. For example, the CL-F distance 5 is given by two numbers 6cl (chromosomal location 

21 distance)and 6f (frequency distance) .This is represented as 8 = [ 5cl. 5f 1- 

22 The "clustenng" of bi-allelic markers near a particular CL-F point is discussed in terms of the number of 

23 markers within a particular CL-F distance of the point. For example, if each of N bi-allelic markers is 

24 separated from the point by a CL-F distance of less than or equal to 5, then the point is said to be N 

25 covered by the markers to within the distance 5. (N being an integer number.) 

26 A region on a CL-F map is called a CL-F region. A CL-F region is a collection of one or more CL-F 

27 points. Some systematic methods of covering a CL-F region with bi-allelic markers are discussed in 

28 terms of the number of markers that are near each point in the region. For example, if each CL-F point 

29 in a CL-F region is N covered to within a CL-F distance 5 by a subset of a set (or group) of bi-allelic 

30 markers, then the region is said to be N covered by the set (or group) of bi-allelic markers to within the 

31 distances. 

32 A set (or group) of bi-allelic markers that cover a CL-F region or a CL-F point is referred to as a set (or 

33 group) of bi-allelic covering markers in this application. 

34 The inventor discovered that when a bi-alleiic marker and a bi-allelic gene are located close together on 

35 a CL-F map, then the power of association based linkage tests to detect linkage disequilibrium between 

36 the marker and a trait-causing gene (when present) increases greatly. Systematically covering a CL-F 

37 region that is the location of an unknown trait-causing bi-allelic gene with bi-allelic covering markers, 

38 therefore greatly increases the power of association based linkage tests to detect linkage disequilibrium 

39 (when present) between one or more of the covering markers and the gene. 
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1 A CL-F matrix is a matrix of rectangular cells of the same length and the same width on a CL-F map. 

2 Stipulations that a certain number of covering markers are placed in each cell of the matrix is a method 

3 of illustrating particular types of systematic covering of a CL-F region with covering marker^s. 

4 The evidence for linkage obtained from two-dimensional linkage studies is essentially two-dimensional 

5 in nature and it is possible to use this two-dimerisional information by essentially graphing quantitative 

6 evidence for linkage as a function of position in the x-y plane. For example, if quantitative evidence for 

7 linkage is represented in the z dimension of a typical three-dimensional x-y-z plot, wherein the x and y 

8 dimensions are chromosomal location and least common allele frequency respectively, then it is 

9 possible to conceptualize evidence for linkage as occurring in a *hump' or "humps" in the z dimension. 

1 0 And it is possible to analyze the data to find the CL-F location (in the x-y plane) of the peak(s) of this 

1 1 'hump(s)", thus helping to localize a trait causing gene to the CL-F locale of the peak(s) of the 

12 °hump(s)'. 

1 3 Versions of the invention also make use of multi-allelfc genes and/or markers. It is always possible to 

14 combine the alleles of a multi-allelic polymorphism (marker or gene) so that the polymorphism acts 

1 5 mathematically like it is a bi-alleltc polymorphism, in effect, it is always possible to mathematically 

1 6 transform a multh-allelic marker or gene to act bi-allelia Similarly, two or more markers can always be 

1 7 mathematically combined to form a mathematical marker that acts like a single bi-alleltc marker. And 

18 two or more genes can always be mathematically combined to fonm a mathematical gene that acts like 

19 a single bi-allelic gene. In this application a mathematical bf-alleJic marker formed mathematically from 

20 one or more markers is called a bi-allelic marker equivalent or BME; and a mathematical bi-allelic gene 

21 fornied mathematically firom one or more genes is called a bi-alleJic gene equivalent or BGE. 

22 The temi true marker or gene is used to distinguish a marker or gene in the ordinary sense from a bi- 

23 allelic marker equivalent (BME) or bi-allelic gene equivalent (BGE). The term true allele is used to 

24 distinguish an allele in the ordinary sense from a mathematical allele of a BME or BGE. A mathematical 

25 allele of a BME or BGE is referred to as an allele equivalent. An allele equivalent Is a combination of 

26 one or more true alleles or one or more haplotypes. 

27 Versions of the invention make use of genes and/or markers, which are not exactly bi-allelic. These 

28 genes or markers are approximately bi-allelic. A gene or marker that is approximately bi-allelic almost 

29 always occurs in one of two allele forms, however, very rarely it occurs in a different allele fomi. 

30 Various versions of the invention are for genotyping individuals at markers which systematically cover 
^\ CL-F regions or for obtaining sample allele firequency.data (such as from pooled DNA) for a sample of 

32 indivkluals for markers which systematically cover CL-F regions. Various versions of the invention are 

33 for oligonucleotides used for genotyping individuals at markers which systematically cover CL-F regions 

34 or are for obtaining sample allele frequency data (such as from pooled DNA) for a sample of indrvkluals 

35 for markers which systematically cover CL-F regions. 
36 



39 For the purposes of the description and claims the terms used herein will have their generally accepted 

40 definition unless otherwise specified. 



37 



Definitions 



38 
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1 1 

1 The term creature means any organism that is living or was alive at one time. This includes both plants 

2 and animals. 

3 The term species is used in it's broadest sense and includes but is not limited to : 1 )biological(genetic) 

4 species,2) paleospecies {successional species), 3) taxonomic (morphological ; phenetic) species 

5 including species hybrids such as mules, 4) microspecies ( agamospecies) 5) biosystematic species( 

6 coenospecies.ecosystem species) 

7 A genetic characteristic is an observable or inferable inherited genetic characteristic or inherited 

8 genetic trait including a biochemical or biophysical genetic trait, for example an inherited disease is a 

9 genetic characteristic, a predisposition to an inherited disease is a genetic characteristic. A phenotypic 
10 characteristic, phenotypic property or character is a genetic characteristic. 

1 ] In this application, the term gene means a polymorphism that takes on one or more allele forms and 

12 which causes or determines an inherited genetic characteristic or genetic trait. The term gene does not 

13 mean an entire gene structure with a promoter region, a terminator region, introns, and other parts of an 

14 entire gene structure. In this application the term gene means a polymorphism that determines or 

1 5 causes an inherited genetic charactenstic and that is part of an entire gene structure in some cases. 

16 Each genetic characteristic of a creature is determined by one or more of the creature's genes, 

17 wherein the term gene is defined as above. 

18 A segment is a segment of a chromosome. 

19 A subrange is a subrange of the least common allele frequency range 0 to 0.5 inclusive. 

20 The width of a subrange is the difference between the upper and lower limits of the subrange. For 

21 example, the width of the subrange 0.1 to 0.4 is 0.4-0.1 = 0.3 . 

22 A chromosomal location-least common allele frequency map is a two-dimensional plot (similar to 

23 an x-y graph) wherein the vertical axis(y axis) represents least common allele frequency and the 

24 horizontal axis(x axis) represents chromosomal location. A chromosomal location-least common allele 

25 frequency map is referred to as a CL-F map. 

26 Points on a CL-F map are referred to as CL-F points. Points on a CL-F map have a chromosomal 

27 location coordinate and a least common allele frequency coordinate. CL-F points represent possible 

28 chromosomal location and least common allele frequency values for individual bi-allelic markers and 

29 genes. Any particular point on a CL-F map is directly opposite a value on the map's least common 

30 allele frequency axis(y axis) and is directly opposite a value on the map's chromosomal location axis(x 

3 1 axis). These two values are the two coordinates of the particular point: (1 ) the chromosomal location 

32 coordinate and (2) the least common allele frequency coordinate. A marker or gene located at a 

33 particular point on a CL-F map is physically located at the chromosomal location given by the 

34 chromosomal location coordinate of the point and the marker or gene's least common allele frequency 

35 is the least common allele frequency coordinate of the point. These two coordinates are designated by 

36 the term ( x, y ) wherein x is the value of the chromosomal location coordinate and y is the value of the 

37 least common allele frequency coordinate. 

38 A particular CL-F map may be large or small. For example it is possible for the chromosomal 

39 location coordinates of CL-F points on a particular CL-F map to range over an entire chromosome ( for 

40 example human chromosome number 6). Alternatively it is possible for the chromosomal location 
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1 coordinates of CL-F points on a particular CL-F map to range over more than one chromosome, for 

2 example ail the human chromosomes, human chromosomes numbersi through 22 and X and Y. 

3 Similarly it is possible for the chromosomal location coordinates of CL-F points on a particular CL-F 

4 map to range over all the chromosomes of a species under study. Alternatively, it is possible for the 

5 chromosomal location coordinates of CL-F points on a particular CL-F map to range over a very small 

6 segment of chromosome, for example a segment of length 1 00,000 bp or less. Similarly it is possible for 

7 the least common allele frequency coordinates of CL-F points on a particular CL-F map to range over 

8 the entire least common allele frequency range 0 to 0.5. Alternatively it is possible for the least common 

9 allele frequency coordinates of CL-F points on a particular CL-F map to range over a subrange or 

10 subranges of the range 0 to 0.5, for example the subrange 0.1 to 0.2. 

11 If a bi-allelic polymorphism (marker or gene) is said to be located at a particular CL-F point then 

12 the polymorphism's chromosomal location is the chromosomal location coordinate of the point and the 

1 3 polymorphism's least common allele frequency is the least common allele frequency coordinate of the 

14 point. 

15 The chromosomal location distance between two CL-F points on a CL-F map is the absolute 

16 difference betw/een the two chromosomal location coordinates of the two points. 

1 7 The frequency distance between two CL-F points on a CL-F map is the absolute difference between 

18 the two least common allele frequency coordinates of the two points. 

1 9 The CL-F distance between two CL-F points is given in terms of two parts or two components . (1 ) 

20 chromosomal location distance and (2) frequency distance. This is denoted as [Dcl, Dp], wherein Dclis 

21 the chromosomal location distance between the two points and Dp is the frequency distance between 

22 the two points. For example [500 bp, 0.3 ] is an example of a CL-F distance. 

23 If a first CL-F distance is less than or equal to a second CL-F distance then the chromosomal 

24 location distance component of the first CL-F distance is less than or equal to the chromosomal location 

25 distance component of the second CL-F distance AND the frequency distance component of the first 

26 CL-F distance is less than or equal to the frequency distance component of the second CL-F distance. 

27 For example if a first CL-F distance is [xi. y,] and a second CL-F distance is [xj. y2]. And if the first CL-F 

28 distance is said to be less than or equal to the second CL-F distance, then Xi is less than or equal to X2 

29 AND yi is less than or equal to y2 

30 The term "bi-allelic covering marker{s)" or "covering marker(s)"is used to distinguish a particular 

3 1 bi-allelic marker or particular bi-allelic markers from other markers. The term is being used simply to 

32 avoid ambiguity. In general the term covering marker(s) can be thought of as a marker or markers 

33 which have been chosen to cover or serve to cover a CL-F point or a CL-F region. 

34 If a CL-F point is said to be N covered to within a CL-F distance 5 by one or more bi-allelic 

35 covering markers then the CL-F distance between each of N or more of the covering markers and the 

36 point is less than or equal to 5 Wherein N is an integer number greater than or equal to 1 . 

37 If a CL-F point is said to be N covered to within a CL-F distance of about (or approximately) 6 by 

38 one or more bi-alielic covering markers then the CL-F distance between each of N or more of the 

39 covering markers and the point is less than or equal to about(or approximately)5. Wherein N is an 

40 integer number greater than or equal to 1 . 
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1 A CL-F region is a group of CL-F points. A CL-F region is a region that is or can be represented on a 

2 CL-F map. A particular CL-F region may be large or small. For example the chromosomal location 

3 coordinates of CL-F points in a particular CL-F region can range over an entire chromosome { for 

4 example human chromosome number 6). Alternatively the chromosomal location coordinates of CL-F 

5 points in a particular CL-F region can range over more than one chromosome, for example all the 

6 human chromosomes, human chromosomes numbersi through 22 and X and Y. Similarly the 

7 chromosomal location coordinates of CL-F points in a particular CL-F region can range over all the 

8 chromosomes of a species under study. Alternatively, the chromosomal location coordinates of CL-F 

9 points in a particular CL-F region can range over only a small segment of chromosome, for example a 

10 segment of length 100,000 bp or less. Similarly the least common allele frequency coordinates of CL-F 

1 1 points in a particular CL-F region can range over the entire least common allele frequency range 0 to 

12 0.5. Alternatively the least common allele frequency coordinates of CL-F points in a particular CL-F 

13 region can range over only a very small subrange, for example the subrange 0. 1 to 0.2 or less. 

14 The length of a CL-F region is the largest chromosomal location distance between any two CL-F 

15 points in the region. 

16 The width of a CL-F region is the largest frequency distance between any two CL-F points in the 

17 region. 

18 A CL-F region that is path connected is contiguous and it is possible to draw a continuous path 

19 between any two points, wherein each point in the path is also in the region. 

20 If a CL-F region is said to be systematically covered by two or more bi-allelic covering marlters 

21 then each point in the region is within a small CL-F distance of one or more of the covering mari<ers, 

22 wherein the magnitude of the small CL-F distance is such that there is increased power of an 

23 association based linkage test to detect evidence for linkage between one or more covering markers 

24 and a gene that is located at a point in the CL-F region, when linkage disequilibrium is present between 

25 the gene and one or more of the covering markers. 

26 If a CL-F region is said to be N covered to within a CL-F distance 6 by one or more covering 

27 markers then each point in the region is N covered to within the CL-F distance 6 by the one or more 

28 covering markers. Wherein N is an integer greater than or equal to one. 

29 If a CL-F region is said to be N covered to within a CL-F distance of about (or approximately) 6 

30 by one or more covering markers then each point in the region is N covered to within the CL-F 

31 distance of about(or approximately) 5 by the one or more covering markers. Wherein N is an integer 

32 greater than or equal to one. 

33 The CL-F distance 5 is known as the covering distance if a CL-F point or CL-F region is N covered 

34 to within a CL-F distance 6. 

35 A CL-F covering distance 5 has two components: (1 ) a chromosomal location distance usually 

36 denoted by 5cl and (2) a least common allele frequency distance {abbreviated as frequency distance) 

37 usually denoted by 5f, i.e. 5 = [ 5cl. 6f 1 

38 The length of a group of covering markers is determined as follows. The absolute chromosomal 

39 location distance between each pair of markers in the group is determined. The greatest absolute 
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1 chromosomal location distance between each pair of marl<ers in the group is the length of the group of 

2 covering markers. 

3 A group of covering markers located on one chromosome can be ordered as a sequence of 

4 markers starting with the marker closest to one end of the chromosome and going toward the other 

5 end of the chromosome. This is denoted for example as mi, m2, ms, ..... mN.2, mN.i, mN , wherein N is 

6 the number of markers in the group. (The chromosomal location distance between miand m^j is greater 

7 than the chromosomaf location distance between any other pair of markers in the group and this 

8 distance is the length of the group of markers.) The chromosomal location distance between two 

9 successive markers in the group , i e. between mR and mp+i , is a chromosomal intermarker 

10 distance. (There are N-1 chromosomal intermarker distances for a group of N covering markers.) 

1 1 The average chromosomal intermarker distance for a group is calculated by dividing the length of 

12 the group by (N-1 ), wherein N is the number of covering markers in the group. 

13 The width of a CL-F region is the largest frequency distance between any two CL-F points in the 

14 region. 

15 The length of a CL-F region is the largest chromosomal location distance between any two CL-F 

16 points in the region. 

1 7 A segment-subrange pair is the pair formed by pairing a segment of a chromosome and a subrange 

1 8 of the least common allele frequency range 0 to 0.5. 

19 The term segment-subrange is used as a short version of the term segment-subrange pair. (A 

20 segment-subrange is a rectangular region on a CL-F map or a rectangular CL-F region, see below.) 

21 If one or more bi-allelic markers are said to be within(or in) a segment-subrange then each of the 

22 markers is located on (or in) the chromosomal segment of the segment-subrange(pair) and each of the 

23 markers' least common allele frequencies is in the subrange of the segment-subrange(pair). (And each 

24 of the markers is located within the rectangular region defined by the segment-subrange on a CL-F 

25 map.) 

26 Alternatively, if a segment-subrange is said to contain one or more markers or to contain the 

27 location of one or more markers then each of the markers is located on (or in) the chromosomal 

28 segment of the segment-subrange and each of the markers' least common allele frequencies is in the 

29 subrange of the segment-subrange. (And each of the markers is located within or is within the 

30 rectangular region on a CL-F map defined by the segment-subrange.) 

3 1 If one or more CL-F points are said to be within<or in) a segment-subrange then each of the points 

32 is located within the rectangular region defined by the segment-subrange on a CL-F map or on the 

33 segment-subrange's borders. 

34 The length of a segment-subrange is the length of the segment of the segment-subrange. 

35 The width of a segment-subrange is the width of the subrange of the segment-subrange. 

36 The area of a segment-subrange is the segment subrange's length multiplied by the segment 

37 subrange's width. 

38 If a CL-F region is said to comprise a segment-subrange, then each point in the segment-subrange 

39 is in(or included in) the CL-F region. 
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1 If a CL-F region is said to comprise an area of greater ttian or equal to X multiplied by Y, then the 

2 CL-F region comprises one or more nonovertapping segment-subranges, and the sum of the areas of 

3 the segment-subranges is greater than or equal to X multiplied by Y. 

4 A CL-F matrix is a collection of segment-subranges, wherein each segment-subrange of the collection 

5 has the same width and the same length. Each segment-subrange in the collection (or the matrix) is a 

6 CL-F matrix celL Any one CL-F matrix cell in a CL-F matrix shares two or more of the cell's borders 

7 with two or more other cells in the matrix. And all the cells in a CL-F matrix together form a single 

8 segment-subrange. A CL-F matrix is characterized by the length and the width of the cells in the matrix 

9 denoted by length x width, or LmcXWmc. wherein Lwc is the length of each cell in the matrix and Wmc is 

1 0 the width of each cell in the matrix. A CL-F matrix is also characterized by the number of rows of cells, 

11 Rm . in the matrix. And a CL-F matrix is characterized by the nijmber of columns of cells. Cm , in the 

12 matrix There are two or more cells in a CL-F matrix, A CL-F matrix is also characterized by the point of 

1 3 origin of the matrix, denoted by (do . fo). The point of origin of a CL-F matrix is at any chromosomal 

1 4 location and do takes on any reasonable value in an entire spedes genome. The point of origin of a 

1 5 CL-F matrix is at any one value in the least common allele frequency range 0 to 0.5. (A CL-F matrix is 

16 similar to the squares of a chessboard or to equal redangular floor tiles that are all oriented in the same 

1 7 direction and cover a rectangular floor. One comer of the matrix is the matrix's point of origin.) 

18 The Width of each cell of a particular CL-F matrix is any value greater than zero and less than 0.5. 

1 9 The width of a cell is often denoted by Wmc - 

20 Any length in chromosomal location distance units is chosen for the length of each cell of a particular 

21 CL-F matrix. The length of a cell is often denoted by Lmc - 

22 The centerpoint of a CL-F matrix cell is in the center of the cell. The centerpoints of a CL-F matrix form 

23 a matrix centerpoint lattice. Each point of a matrix centerpoint lattice is separated by a CL-F distance 

24 of [0, Wmc] or [Lmc. 0] from two or more neighboring centerpoints. 

25 If one or more bi-allelic markers are ln(or within) the segment-subrange that Is a CL-F matrbc 

26 cell, then each of the mariners is in orwrthin the CL-F matrix cell. 

27 If one or more CL-F points Is in (or within) a CL-F matrix, then each of the points is in or within a cell 

28 of the matrpc. 

29 If a CL-F region comprises a CL-F matrix, then each point that is in the matrix is also in the region. 

30 If a CL-F region is a CL-F matrix, then the region consists of the points that are in the matrix. 

31 If two CL-F matrix cells share a common border, then the two CL-F matrix cells are in contact 

32 If two CL-F matrix cells share a common comer, then the two CL-F matrix cells are touching. (Two 

33 cells that are in contact are also touching.) 

34 If a group of CL-F points is connected to within a CL-F distance pf.Y]. then for any two points in 

35 the group, denoted p, and pr . there is an orclered sequence of points in the group denoted pi. P2. 

P3' Pr-1. Pr ^ R being an integer greater than or equal to 2, wherein the CL-F distance between 

37 each point in the sequence and the next point in the sequence is less than or equal to pc,Y]. The 

38 distance p(,Y] is the connecting cJistance. (Put in simple tenns if a group of points is connected to 

39 within pc.Y]. then there is a path between each pair of points In the group, the path consisting of a 

40 series of steps, wherein each step in the path is a movement between two points in the group that are 
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separated by a CL-F distance of less than or equal to PC,Y1. A simple group of points connected to 
within a CL-F distance of pc.Y] is a group of three points, wherein each point in the group is within a CL- 
F distance of less than or equal to pc.Y] of another point in the group. The concept of connecth/ity 
intrxjduced here is similar to the basic concept of connectivHy in mathematical graph theory.) 
If a group of N markers is connected to within a CL-F distance pc.Y]. wherein N is an integer, then 
each of the markers is located at one point of a group of N points, the group of N points being 
connected to within a CL-F distance [X.Y1. 

If two bi^llelic markers are sard to be in extreme positive disequilibrium then d is approximately 
equal to dpnax for the two mariners, which for the purposes of this definition are designated marker M 
with least common allele A and marker m with least common allele B. Wherein according to standaid 
usage, the disequilibrium coefficient (d) is defined by the equation d-f(AB) - f(A)f(B) where f(A) and 1(E) 
are defined as the popuIaUon frequencies of alleles A and B, respectively, and f(AB) is the population 
frequency of the AB haplotype. And dmax is defined as the maximum possible positive value of d 
assuming the allele frequencies of A and B are f(A) and f(B). and thus dmax= q-f(A)f(B) where q is the 
lesser of f(A) and f(B), (In this application d is used to represent the disequilibrium coeffident; the 
symbol S is often used in sdenUflc papers to represent the disequilibrtum coefficient.) 
If a pair of markers is said to be in extreme positive disequilibrium, then the two markers of the 
pair are in extreme positive disequilibrium. 

if a pair of bi-allelic markers is said to be redundant within distance D then the two markers of the 
pair are in extreme positive disequilibrium and the two markers are located on the same chromosome 
and the two markers are located within a CL-F distance D of each other on a CL-F map. wherein D is a 
specified distance and D has two components, a chromosomal location distance component Dc, and a 
frequency distance component. Dp; D = [Dcl, Dp]. 

An allele equivalent (AE) is a group of one or more "haplotype values" of one or more polymorphisms 
of the same type, either markers or genes. ( For the purposes of this application a haplotype value of 
one polymorphism is equrvalent to an allele value at the one polymorphism.)The group of haplotype 
values is then analyzed as if the group is a single allele at a bi-allelic polymorphism; the group of 
haplotype values acts as a single allele at a bi-allelic polymorphism; the collection of the one or more 
polymorphisms upon which the haplotype values are based acts as a bi-allelic polymorphism- the 
collection of one or more polymorphisms forms a bi-allelic polymorphism equivalent (PE) that acts as 
a bi-allelic polymorphism, the polymorphism equivalent has(or possesses) the allele equivalent 
The allele equivalent belongs to the polymorphism equivalent In this application, each polymorphism 
equivalent is a bi-allelic marker equivalent(BME) or a br-allelic gene equivalent(BGE). 
A bi-allelic marker equivalent (BME) is one or more markers and a grouping of the haplotype values 
of the one or more markers into two groups (e.g. group I and group ll)(For the purposes of this 
application a "haplotype value" of one marker is equivalent to an allele at the one martcer) The one or 
more maricer^ and the two groups of haplotype values of the one or mor^ markers are then analyzed as 
.f the one or more markers are a single bl-allelic mariner with alleles I and 11. Each group of the groups I 
and Ihs an allele equivalent For example, a multi-allellc microsatelIHe marker has it's multiple alleles 
grouped mto two groups and the microsatelllte maricer and these two groups of alleles then act 
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1 equivalent to a bi-allelic marker and are analyzed as if the mjcrosatellite marker with the two groups is 

2 bi-allelic (for an example of this see McGinnis, Ewens & Spielman. Genetic Epidemiology 1995 ; 12(6) : 

3 637-40. which is incorporated herein by reference) 

4 Also for example, two or more multi-ailelic markers have their hapiotypes separated into two groups of 

5 hapiotypes and the multi-allelic markers with their two groups of hapiotypes are analyzed as if they 

6 were a single bi-allelic marker. 

7 For example bi-allelic marker A has alleles a and a* and bi-allelic marker B has alleles b and b* Then 

8 the four hapiotypes ab, ab*. a*b* and a*b are grouped into two groups, for example group I; ab and a*b* 

9 and group II : ab* and a*b. Then a BME formed by markers A and B takes on values of group I (or I) for 

1 0 hapiotypes ab or a*b* or group II (or II) for the hapiotypes ab* or a*b , and the two markers and the two 

1 1 group values( I and (I) are analyzed as though they fonn a single bi-allelic marker(the BME). The same 

12 type of reasoning and procedure is extended to 3 or more bi-alleiic markers, 3 or more bi-alielic marker 

1 3 equivalents or 2 or more multi-allelic markers. 

14 (Logically, of course, the genotype at a BME for an individual is determined by knowing the two 

1 5 haplotype values at the one or more markers that form the BME for each of the individual's two 

16 homologous chromosomes that cany the one or more markers. The genotype is then determined by 

1 7 classifying each haplotype as belonging to group I or group II or the equivalent thereof. The three 

18 possible genotype values at the BME are I / I, I / II, and II / II or the equivalent thereof.) 

19 Similarly, a bi>alleiic gene equivalent (BGE) is one or more genes and a grouping of all the haplotype 

20 values of the one or more genes into two groups (e.g. group I and group II). 

2 1 For the purposes of the description and claims, the chromosomal location of a polymorphism 

22 equivalent is at any point on the smallest chromosomal segment that contains the one or more 

23 polymorphisms that form the polymoiphism equivalent(PE). 

24 The allele frequency of an allele equivalent (AE) is determined as follows. An allele equivalent (AE) 

25 is a group of haplotype values of one or more polymorphisms. The frequency of the allele equivalent is 

26 the sum of the frequencies of the haplotype values in the group that makes up the allele equivalent. 

27 For the purposes of the application, description, claims and definitions the term true allele is used to 

28 distinguish an allele according to standard usage (i.e. at a single polymorphism) from an allele 

29 equivalent (AE). 

30 The least common allele frequency of a bi-allelic polymorphism equivalent (BPE) is determined 

3 1 as follows. Each of the two groups( i and 11) of the haplotype values of the one or more polymorphisms 

32 which form the BPE is assigned a frequency. The frequency of I is the sum of the frequencies of the 

33 haplotype values in group I. And the frequency of II is the sum of the frequencies of the haplotype 

34 values in group II. The least of the frequency of I and the frequency of 11 is the least common allele 

35 frequency of the BPE. If the frequency of I and the frequency of II are equal, then the least common 

36 allele frequency of the BPE is the frequency of I or the frequency of II. 

37 For the purposes of the description and claims, the chromosomal location of a bi-allelic marker 

38 equivalent (BME) is at any point on the smallest chromosomal segment which contains the one or 

39 more markers which form the BME. 
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1 The chromosomal location distance from a BME to a CL-F point on a CL-F map is the shortest 

2 Chromosomal location distance from the CL-F point to any one of the one or more markers which form 

3 the BME. 

4 The least common allele frequency of a bi-allelic marker equivalent (BME) is determined as 

5 follows. Each of the two groups( I and II) of the haplotype values of the one or more markers which form 

6 the BME is assigned a frequency. The frequency of I is the sum of the frequencies of the haplotype 

7 values in group I. And the frequency of II is the sum of the frequencies of the haplotype values in group 

8 II. The least of the frequency of I and the frequency of II is the least common allele frequency of the 

9 BME. If the frequency of I and the frequency of II are equal, then the least common allele frequency of 

10 the BME is the frequency of I or the frequency of If. 

1 1 The frequency distance from a BME to a CL-F point on a CL-F map is the absolute difference 

12 between the least common allele frequency of the BME and the least common allele frequency 

13 coordinate of the CL-F point. 

14 ( If a CL-F point on a CL-F map is covered by one or more BMEs to within a distance 5, wherein 5 = [6cl 

15 , 6f ], then the CL-F distance from each of the one or more BMEs to the CL-F point is less than or equal 

16 to 6 And the chromosomal location distance from one of the markers which form each BME to the CL-F 

1 7 point is less than or equal to 6cl ■ And the frequency distance from each of the one or more BMEs to the 

1 8 CL-F point is less than or equal to 5f . ) 

19 A bi-allefic marker equivalent is in(or within) each CL-F matrix cell that contains the 

20 chromosomal location of the bi-allelic marker equivalent (BME). (Since the chromosomal location 

21 of a bi-allelic marker equivalent (BME) is at any point on the smallest chromosomal segment which 

22 contains the one or more markers which fonm the BME, in some cases, a bi-allelic marker equivalent is 

23 in more than one CL-F matrix cell.) 

24 For the purposes of the application, the term true bi-allelic marker is used to distinguish a bi-allelic 

25 marker with two alleles according to usual usage (i.e. at a single polymorphism) from a bi-allelic marker 

26 equivalent(BME). A true bi-allelic marker is not a bi-allelic marker equivalent (BME). The term true bl- 

27 allelic polymorphism is used to distinguish a bi-allelic polymorphism with two alleles according to 

28 usual usage from a bi-allelic polymorphism equivalent(BPE). 

29 The term true allele of a true bi-allelic marker means an allele of a true bi-allelic marker. 

30 A polymorphism( marker or gene) which is exactly bi-allelic has exactly two alleles and the sum of 

3 1 the frequency of each of the two alleles is 1 ; for example if the two alleles are A and B, then f(A) + f(B) 

32 = 1 . A polymorphism that is exactly bi-alielic is a true bi-allelic polymorphism with exactly two true 

33 alleles or a bi-allelic polymorphism equivalent (BPE) with exactly two allele equivalents. 

34 A polymorphism(marker or gene) which is approximately bi-allelic has three or more alleles. And 

35 the polymorphism has a first allele and a second allele; and the sum of the frequency of the first allele 

36 and the frequency of the second allele is approximately 1 . And the frequency of the first allele and the 

37 frequency of the second allele is much greater than the sum of the allele frequencies of all the alleles of 

38 the polymorphism that are not the first or the second alleles. For the versions of the invention for bi- 

39 allelic polymorphisms (bi-allelic markers and bi-alleJic genes) described herein, a polymorphism which 

40 is approximately bi-allelic is analyzed as if the polymorphism has only two alleles, the first allele and the 
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1 second allele. For the versions of the invention described herein, the least common allele frequency of 

2 a polymorphism which is approximately bi-allelic, is the least of the frequencies of the first and the 

3 second alleles of the polymorphism. A polymorphism which is approximately bi-allelic is a true 

4 polymorphism with true alleles (the allele frequencies of the true alleles conform to the stipulations of 

5 this definition) or ts a bi-allelic polymorphism equivalent(BPE) with allele equivalents(the allele 

6 frequencies of the allele equivalents conform to the stipulations of this definition). 

7 SNP stands for single nucleotide polymorphism. 

8 A Statistical linkage test based on allelic association is any mathematical test, mathematical 

9 computation or equivalent thereof vitiich gives a quantitative estimate (or equivalent thereof) of 

10 evidence for linkage of a polymorphic marker and phenotypic trait (genetic characteristic) based on 

1 1 association between one or more of the alleles of the marker and the phenotypic trait in a sample of 

12 individuals of a population of a species. A statistical linkage test based on allelic association is any 

13 statistical test that detects or suggests linkage on the basis of allelic association. A statistical linkage 

14 test based on allelic association includes tests which suggest but do not prove linkage such as 

15 comparison of marker allele frequencies in disease cases and in unrelated controls. A statistical linkage 

16 test based on allelic association is also any test such as the TDT which may be regarded as "proving" 

17 linkage. (A statistical linkage test based on allelic association can, of course, give an estimate of the 

18 association of one or more allele equivalents of a marker equivalent and a genetic characteristic; see 

19 definition of BME above.) One aspect of a statistical linkage test based on allelic association is it's 

20 potential use to calculate the probability, or equivalent thereof, that there is genuine association of one 

21 or more of the alleles of the marker and a genetic characteristic for the population as a whole (rather 

22 than just for the sample alone). A statistical linkage test based on allelic association is an association 

23 based linkage test. (The term population in this application is used in a statistical sense and means a 

24 group of individuals. The term population in this application is not used purely in the sense the term 

25 population is used in the field of population genetics.) 

26 The term sample means a group of individuals which is a subset of a population. 

27 In this application, an allele is considered to be a piece of double stranded ONA that is singular or 

28 distinctive for the allele. The piece of double stranded DNA that is distinctive for the allele contains the 

29 particular DNA sequence that distinguishes the allele from other alleles (alternate sequences) at the 

30 polymorphic site of interest plus two double stranded "flanking" DNA sequences, one flanking DNA 

3 1 sequence being on one side of the polymorphic site and the other flanking DNA sequence being on the 

32 other side of the polymorphic site. 

33 Alternate strand of an allele: A double stranded piece of DNA that is distinctive for an allele consists 

34 of two pieces of single stranded DNA virtiich are exactly complementary to one another. The two pieces 

35 of single stranded DNA are referred to as the two strands of the allele. Each of the two strands of the 

36 allele is the alternate of the other strand of the allele. For the purposes of this definition, the two strands 

37 are referred to as the first strand and the second strand. The alternate strand of the first strand is the 

38 second strand. And the alternate strand of the second strand is the first strand. Each strand of an allele 

39 is exactly complementary to the strand's altennate strand. 
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1 An oligonucleotide is either a single or double stranded oligonucleotide. The length of an 

2 oligonucleotide ranges from a few bases or base pairs to approximately any number of bases or base 

3 pairs in the DNA sequence of any allele. 

4 An oligonucleotide, either single or double stranded, is complementary to an allele if the DNA 

5 sequence of each strand of the oligonucleotide is exactly or approximately complementary to alt or part 

6 of the DNA sequence of one of the DNA strands of the allele and the oligonucleotide has utility in 

7 Identifying the allele by a hybridization reaction or equivalent thereof similar to as described below 

8 under oligonucleotide technology. 

9 An allele Is identified by a hybridization reaction with an oligonucleotide that Is complementary 

10 to the allele. In this application there are two types of oligonucleotides that are complementary to 

11 an allele. The two types of oligonucleotides complementary to an allele are identified as type(1 ) or 



13 A type (1 ) complementary oligonucleotide is complementary to the part of an allele's DNA sequence 

14 that actually contains the allele's polymoiphic site; and the type(1 ) complementary oligonucleotide has 

1 5 utility to identify the allele by means of a hybridization reaction of the oligonucleotide to the part of the 

16 allele's DNA sequence that actually contains the allele's polymorphic site. A hybridization reaction of a 

1 7 type(1 ) oligonucleotide to the part of an allele's DNA sequence that actually contains the allele's 

18 polymorphic site is a type (1) hybridization reaction. 

19 A type (2) complementary oligonucleotide is compJementary to an allele at a DNA sequence that flanks 

20 {but does not contain) the allele's polymorphic site; and the type (2) complementary oligonucleotide has 

2 1 utility to identify the allele by means of a hybridization reaction wherein the oligonucleotide hybridizes to 

22 the allele at a DNA sequence that flanks (but does not contain) the allele's polymorphic site and 

23 identification of the allele is subsequently achieved by extension of the oligonucleotide (and possibly 

24 one or more other type(2)complementary oligonucleotides) across the polymorphic site with a DNA 

25 polymerase such as occurs, for example, in a standard PGR (polymerase chain reaction). A 

26 hybridization reaction of a type(2) oligonucleotide to an allele at a DNA sequence that flanks (but does 

27 not contain) the allele's polymorphic site is a type (2) hybridization reaction. 

28 Each version of oligonucleotide technology is a means to test for the presence (or absence) of each 

29 of one or more true alleles of a group of true alleles in an individual's chromosomal DNA. The presence 

30 or absence of any one true allele in the group is tested for by means of a type (1 ) or type (2) 

3 1 hybridization reaction (or equivalent) with an oligonucleotide that is complementary (type{1 ) or type(2)) 

32 to the true allele. Put another way, the presence or absence of each true allele in the group is tested for 

33 by means of a type(1 ) or type{2)hybridization reaction (or equivalent) with an oligonucleotide that is 

34 complementary to each true allele in the group. There are many versions of oligonucleotide technology, 

35 some of these versions are described in more detail below. (In this application, the term "chromosomal 

36 DNA" includes chromosomal DNA obtained directly from an individual as well as DNA obtained as 

37 amplification products using PGR and chromosomal DNA obtained directly from an individual. 



12 



type(2). 



38 
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1 A physico-chemical signal is any physical (including chemical) signal which is detected by human 

2 senses or by apparatus. A physico-chemical signal includes, but is not limited to, (1 ) an electrical signal 

3 such as IS generated when oligonucleotides that are attached to a silicon chip hybridize with 

4 complementary alleles, (2) a visual or optical signal such as is generated when oligonucleotides 

5 attached to a glass slide hybridize with complementary alleles, (3) a signal (such as a dye color) 

6 generated by the products of a PCR(polymerase chain reaction) such as when oligonucleotides that are 

7 used as primers for PGR reactions hybridize with complementary alleles. 

8 The collection of true alleles of a group of one or more bi-allelic markers is defined as consisting 

9 of each true allele of each true marker in the group and each true allele of each haplotype that forms 

10 each allele equivalent of each marker equivalent in the group. 

11 If a set of oligonucleotides is said to be complementary to a group of one or more bi-alielic 

12 markers, then each oligonucleotide in the set is type(1 ) or type(2) complementary to at least one of the 

13 true alleles in the collection of true alleles of the group of one or more markers; and there is an 

14 oligonucleotide in the set that is type(1 ) or type(2) complementary to each true allele in the collection of 

15 true alleles of the group of one or more markers. 

16 Sample allele frequency data for a marker and a sample is obtained by pooling DNA specimens 

1 7 from individuals of the sample into one or more DNA pools. An allele frequency for each of the marker's 

18 alleles is obtained for each DNA pool. In the case of a bi-allelic marker, determining the sample allele 

19 frequency for one allele essentially determines the sample allele frequency for the other allele. (For 

20 example, in some association based linkage studies, each DNA pool contains DNA from individuals of 

21 the sample with the same or similar phenotype status.) (It is also possible to obtain sample allele 

22 frequency for a marker and a sample by calculation using genotype data at the marker for each 

23 individual in the sample.) 

24 Genotype data/sample allele frequency data for a marker and a sample is (1 )genotype data at the 

25 marker for each individual of the sample, or (2)a combination of genotype data at the marker for one or 

26 more individuals in the sample and sample allele frequency data for the marker for the sample, or 

27 (3)sample allele frequency data for the marker for the sample. In the case of genotype data, DNA 

28 specimens from individuals are tested individually to determine genotype. In the case of sample allele 

29 frequency data DNA specimens from individuals are pooled, or sample allele frequency is calculated 

30 using genotype data for each individual in the sample. 
31 

32 Description 

33 

34 For the versions of the invention described herein and the claims, a bi-allelic genetic characteristic 

35 gene or a bi-allelic gene is a gene which is exactly bi-allelic or a gene which is approximately bi-allelic. 

36 For the versions of the invention described herein and the claims, a bi-allelic genetic characteristic 

37 gene or a bi-allelic gene is a gene which is a tnje bi-allelic gene or a bi-allelic gene equivalent (BGE). 

38 A bi-allelic gene equivalent is exactly bi-allelic or approximately bi-allelic. A true bi-allelic gene is exactly 

39 bi-allelic or approximately bi-allelic. 
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1 For the versions of the invention described herein and the claims, a bi-alleiic marker or a bi-allelic 

2 covering marker is a marker which is exactly bi-allelic or a marker w^tch is approximately bi-allelic. 

3 Each marker that is exactly bi-allelic is a true bi-allelic marker or a bi-allelic marker equivalent. And 

4 each marker that is approximately bi-allelic is a true bi-ailelic marker or a bi-allelic marker equivalent 

5 (BME). 
6 

7 Process #1 , A process for identifying one or more bi-alfelic markers linked to a bi-allelic genetic 

8 characteristic gene in a species of creatures, comprising the steps of : 

9 

10 a)choosing two or more bi-allelic covenng markers so that a CL-F region is systematically covered by 

1 ] the two or more covering markers; 

12 

13 b)choosing a statistical linkage test based on allelic association for each covering marker; 
14 

1 5 c)choosing a sample of individuals for each covering marker ; 

16 

1 7 d)obtaining genotype data/sample allele frequency data for each covering marker and the sample 

1 8 chosen for each covering marker, and obtaining phenotype status data for the genetic characteristic for 

19 each individual in the sample chosen for each covering marker; 
20 

21 e)calculating evidence for linkage between each covenng marker and the gene using the statistical 

22 linkage test based on allelic association chosen for each covering marker and the genotype 

23 data/sample allele frequency data for each covering marker and using the phenotype status data for the 

24 genetic characteristic for each individual in the sample chosen for each covering marker obtained in d); 

25 and 
26 

27 f)ldentifying those covering markers as linked to the genetic characteristic gene v^^ich show evidence 

28 for linkage based on the calculations of step e. 

29 

30 The following is a more detailed description of process #1. 

31 

32 Process #1, A process for identifying one or more bi-alleiic markers linked to a bi-allelic genetic 

33 characteristic gene in a species of creatures comprising the steps of : 

34 

35 a)choosing two or more bi-allelic covering markers so that a CL-F region is systematically 

36 covered by the two or more covering markers; Any method of systematically covenng the CL-F 

37 region is acceptable. In this application, the systematic covenng of a CL-F region in versions of the 

38 invention is described mathematically as the covering of a CL-F region, wherein the CL-F region is N 

39 covered to within a CL-F distance 6 by two or more bi-alleiic covering markers. For further details 
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regarding this step, see Detailed Description of the Systematic Covering of a CL-F Region Used In 
Versions of the Invention below. 

b)choosing a statistical iinicage test based on allelic association for each covering marker ; The 

statistical linkage test based on allelic association chosen for any one particular covering marker is any 
statistical linkage test based on allelic association as defined in the definitions section. Statistical 
linkage tests based on allelic association are described in the genetics and population genetics 
literature and are known to those of ordinary skill in the art. Some examples of a statistical linkage test 
based on allelic association are the TDT, Haplotype Relative Risk Method(HRR) and Allele Frequency 
Comparison In Disease Cases Versus Unrelated Controls It is possible for different statistical linkage 
tests based on allelic association to be chosen for different covering markers. For purposes of technical 
convenience, the same statistical linkage test based on allelic association is preferably chosen for each 
covering marker. 



1 5 c)choosing a sample of individuals from the species for each covering martter ; For the process 

16 to be workable, the sample chosen for any one covenng marker must be suitable for the statistical 

1 7 linkage test of b) above chosen for the covering mari<er. Knowledge of a suitable sample for the 

18 statistical linkage test chosen in b) above for the covering marker is within the understanding of a 

19 person skilled in the art. For purposes of technical convenience, the same sample of individuals is 

20 preferably chosen for each covering marker. 
21 

22 d)obtaining genotype data/sample allele frequency data for each covering marker and the 

23 sample chosen for each covering marker, and obtaining phenotype status data for the genetic 

24 characteristic for each individual in the sample chosen for each covering marker; 

25 Sample allele frequency data for any one covering marker for the sample chosen for the covenng 

26 marker is obtained by pooling DNA from individuals of the sample into one or more DNA pools. It is also 

27 possible to obtain sample allele frequency data for any one covering marker by calculation using 

28 genotype data at the marker for each individual in the sample. Each DNA pool contains DNA from 

29 individuals of the sample with the same or similar phenotype status. An allele frequency for each of the 

30 marker's alleles is obtained for each pool. Genotype data/sample allele frequency data for any one 

3 1 covering marker is ( 1 )genotype data at the covering marker for each individual in the sample chosen for 

32 the covering marker, or (2)a combination of genotype data at the covering marker for one or more 

33 individuals in the sample chosen for the covering marker and sample allele frequency data for the 

34 covenng marker for the sample chosen for the covering marker, or (3)sample allele frequency data for 

35 the covering marker for the sample chosen for the covering marker. The genotype data/sample allele 

36 frequency data for any one covering marker must be suitable for the statistical linkage test based on 

37 allelic association chosen for the covering mari<er in step b). It is possible to choose different types of 

38 genotype data/sample allele frequency data for each covering marker. For purposes of technical 

39 convenience, the same type of genotype data/sample allele frequency data (1 ), (2), or (3) is chosen for 
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1 each covering marker. Some examples of ways to practice this step is the use of technology cited 

2 under Oligonucleotide Technology (below) or mass spectrometry (such as MALDITOF) ' 

3 

4 e)calculatlng evidence for linkage between eacli covering marker and the gene using the 

5 statistical linkage test based on allelic association chosen for each covering marker and the 

6 genotype data/sample allele frequency data for each covering marker and using the phenotype 

7 status data for the genetic characteristic for each individual in the sample chosen for each 

8 covering marker obtained in d); and 
9 

10 

1 1 f)identifytng those covering markers as linked to the gene which show evidence for linkage 

12 based on the calculations of step e. 

13 The meanings of steps d, e and f are within the understanding of those of ordinary skill in the art. Fine 

14 points of using a statistical linkage test based on allelic association as a measure of evidence for 

1 5 linkage are known to those in the art." 
16 

17 Process #1 described above is equivalent to Jocalizing a genetic characteristic gene to a particular 

18 chromosomal location (i.e. a sub-region of a particular chromosome.) This is because markers which 

19 are linked to a gene are also physically dose to the gene in terms of physical (chromosomal) location. 

20 To locate a gene causing the genetic characteristic of Process #1 , the gene is localized to the 

21 approximate chromosomal location of one or. more covering markers which are identified as showing 

22 evidence for linkage in step f). 

23 Process#1A it is also possible to use Process #1 to localize a genetic characteristic gene to an 

24 approximate CL-F location(chromosomal location-least common allele frequency location). Such a 

25 process is expressed as follows: 

26 Process#1A : A process for localizing a bi-ailelic genetic characteristic gene in a species of 

27 creatures to a chromosomal location-least common allele frequency (CL-F) location, comprising 

28 the steps a), b), c), d) and e) of Process #1 and further comprising the step of: 

29 f)locaiizing the gene to the chromosomal location-least common allele frequency (CL-F) location 

30 of one or more markers that show evidence for linkage based on the calculations of step e). 

31 It is the teaching of this application that the strength of evidence for linkage increases as markers that 

32 are in linkage disequilibrium with a gene become close to the gene on a CL-F map. It is possible for 

33 step f) to be done by an individual plotting data by hand and examining the data. It is also possible for 

34 software to perform step f). It is possible for this step to include using the dependence of quantitative 

35 evidence for linkage of step e) on CL-F location. For example, if quantitative evidence for linkage 

36 calculated in step e) (of process #1 or #1 A) is represented in the z dimension of a typical three- 

37 dimensional x-y-z plot, wherein the x and y dimensions are chromosomal location and least common 

38 allele frequency respectively, then it is possible to conceptualize evidence for linkage as occurring in a 

39 "hump" (or "humps" )in the z dimension. And it is possible to use the evidence for linkage calculated in 

40 step e) of (process #1 or #1 A) to find the CL-F location (in the x-y plane) of the peak(s) of a "hump(s)", 



SUBSTITUTE SHEET (RULE 26) 



wo 99/43858 



PCTAJS99/04376 



25 

1 thus helping to localize a trait causing gene to the CL-F locale of the peak(s) of the "hump(s)". For 

2 example it is possible to use computer programming techniques that detect gradients such as. for 

3 example, linear or nonlinear programming techniques in mathematical optimization theory'" to find the 

4 peak{s) of a hump(s) in this step. 

5 (Process #1 A described above is equivalent to localizing a genetic characteristic gene to a particular 

6 chromosomal location (i.e. a sub-region of a particular chromosome.) This is because localizing a gene 

7 to a particular CL-F region also localizes the gene to a particular chromosomal region.) 

8 Software 

9 A computer program that executes each step of Process#1 is an example of Process#1. A computer 

10 program that executes each step of Process#1 A is an example of Process#1 A. A flowsheet illustrating 

1 1 programs that execute Process#1 and Process#1 A is entitled Drawing #1 (see drawing section). It is 

12 also possible for a computer program to execute any one of(or one or more combinations of) the steps 

13 of Process#1 or Process#1 A. A person of ordinary skill in the art could write such a program without 

14 undue experimentation. The level of skill at computer programming in the art is great as evidenced by 

1 5 numerous computer programs. Some computer programs in the art are programs such as 

16 MAPMAKER/SIBS'^ . GENEHUNTER^ . LINKAGE^' , and FASTLINK ^" 

17 Detailed Description of the Systematic Covering of a CL-F Region Used In Versions of the Invention 

18 (see definitions section for meaning of CL-F region that is systematically covered). The CL-F region and 

19 covering markers are for a species and the one or more individuals are members of the species. The 

20 chromosomal location coordinate of each covering marker is based on information regarding the 

21 chromosomal location of each covering marker. One such source of information is chromosomal maps. 

22 Chromosomal maps are provided by such institutions as the Whitehead Institute or Marshfield 

23 Foundation for Biomedical Research. Chromosomal maps include, but are not limited to genetic maps, 

24 physical maps, and radiation hybrid maps. 

25 The least common allele frequency coordinate of each covering marker is based on any reasonable 

26 information regarding the least common allele frequency of each covenng marker. It is possible to use 

27 information from different populations for the allele frequencies of different covering markers. For 

28 example, it is possible for the least common allele frequencies of two different covering markers to be 

29 based on information from two different, but similar populations. For purposes of technical convenience, 

30 the least common allele frequency of each covering marker is based on information from the same 

3 1 population. One source of information on least common allele frequency is institutions which provide 

32 chromosomal maps such as the Whitehead institute or Marshfield Foundation for Biomedical Research. 

33 Systematic Covering Of A CL-F Region. Wherein A CL-F Region Is N Covered To Within A CL-F 

34 Distance 5 By Two or more Bi-AHeiic Covering Markers 

35 In this application, the systematic covering of a CL-F region in versions of the invention is described 

36 mathematically as the covenng of a CL-F region, wherein the CL-F region is N covered to within a CL-F 

37 distance 5 by two or more bi-allelic covering markers. The covering markers are chosen so that the CL- 

38 F region is N covered to within the CL-F distance 5 by using information regarding the chromosomal 

39 location and least common allele frequency of each covering marker. 

40 It is possible for the chromosomal location component of 5 to be as great as about any chromosomal 

41 length, computed by any method, for which linkage disequilibrium has been observed between any 
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1 polymorphisms in any population of the species. It is preferable in terms of increasing the power of a 

2 version of the invention for linkage studies that the chromosomal location component of 6 be less than 

3 about the greatest chromosomal length, computed by any method, for which linkage disequilibrium has 

4 been observed between any polymorphisms in any population of the species. In general, the smaller 

5 the chromosomal location component of 5, the greater the power of a version of the invention for 

6 linkage studies. 

7 It is possible for the frequency distance component of 6 to be as great as about 0.2. ( Depending on the 

8 penetrance ratio (r) or the disequilibrium between marker and gene, it is also possible for the frequency 

9 distance component of 5 to be greater than 0.2 under some conditions as evidenced by Table 2 under 

1 0 Theory of Operation. So it is also possible for the frequency distance component of 6 to be as great as 

1 1 about 0.25 or higher. ) It is preferable in terms of increasing the power of a version of the invention for 

12 linkage studies that the frequency distance component of 5 to be less than about 0.2. In general, the 

13 smaller the frequency distance component of 6, the greater the power of a version of the invention for 

14 linkage studies. 

15 Linkage disequilibrium has been observed between polymorphisms separated by 10 to 12 cM in some 

16 homogeneous human populations. Therefore, it is possible for the diromosomal location distance 

1 7 component of 5 to be as large as about 1 0 to 1 2 cM, about 1 0 to 1 2 million bp, or the equivalent thereof 

18 for homogeneous human populations. It is preferable in terms of increasing the power of a version of 

19 the invention for linkage studies in human populations that 6 is less than or equal to about [ 1 million bp, 

20 0. 1 5] or the equivalent thereof. It is more preferable in tenms of increasing the power of a version of the 

21 invention for linkage studies in human populations that 5 is less than or equal to about [ 250,000 bp, 

22 0.1] or the equivalent thereof 

23 In general, the smaller the magnitude of 5 is in temns of either frequency distance, chromosomal 

24 location distance, or both, the greater the power of a version of the invention for linkage studies. In 

25 general, the greater N is, the greater the power of a version of the invention for linkage studies. 

26 Because the greater N is, the greater the chance that linkage is detected between one or more covering 

27 markers and a gene or genes. The largest that N is chosen is limited by the number of known markers 

28 in the neighborhood of the CL-F region and also by the distribution of the known markers. 

29 In general, the larger the CL-F region which is N covered, the greater the power of a version of the 

30 invention for linkage studies, because a larger region is scanned (covered). Less dense coverings 

31 wherein N is small, and the magnitude of 6 is large also have technical and economic advantages for 

32 certain situations. 

33 Specific types of CL-F regions that are N covered 

34 Specific types of CL-F regions that are N covered are useful. For example, a rectangular CL-F region, a 

35 segment-subrange, that is N covered is used in an association based linkage study to test for the 

36 presence of a trait causing bi-allelic gene located within the segment-subrange. In the case in which a 

37 group of points is N covered to within a CL-F distance [x.y] and the group of points is connected to 

38 within a CL-F distance of t2x,2y] or less, then a path connected CL-F region is N covered to within the 

39 CL-F distance [x,y]. 
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1 A CL-F matrix is a device to illustrate and descrtbe the systematic nature of special cases of CL-F 

2 regions that are N covered. In the case in which there are N or more markers within each cell of a CL-F 

3 matrix, then each point within the matrix is N covered to within the CL-F distance (Lcm , Wcm]> wherein 

4 Lcm is the length of a matrix cell and Wcm is the width of a matrix ceil. A choice of covering markers so 

5 that approximately the same number of covering markers are in each cell of a CL-F matrix has utility in 

6 that approximately the same amount of effort is expended on each subregion (cell) of the CL-F region 

7 defined by the matrix in a linkage study using the covering markers. If the centerpoints of a CL-F matrix 

8 (a matrix centerpoint lattice) are each N covered by a group of covering markers to within a CL-F 

9 distance [x,y], then each point in the matrix is N covered to within the CL-F distance [2x,2y]. A CL-F 

10 matrix can be used as a device to help distinguish versions of the invention from prior art (to the extent 

1 1 that there is prior art). 

12 A requirement that the CL-F region that is N covered to within a certain CL-F distance comprise a 

13 certain minimum area or segment-subrange with a certain minimum area is a special case of CL-F 

14 regions that are N covered to within the certain CL-F distance. A requirement that the CL-F region that 

15 is N covered to within a certain CL-F distance has a certain length or width is a special case of CL-F 

16 regions that are N covered to within the certain CL-F distance. Each of these requirements is also a 

1 7 device that can be used to help distinguish versions of the invention from prior art. 

18 A Note on the Equival ence of Working With individual Alleles of Markers to Perform Two- 

19 dimension al Linkage Studies and the CL-F approach using bi-allelic markers 

20 It Is possible to conceptualize performing two-dimensional linkage studies wherein individual marker 

21 alleles are used to cover a two-dimensional space, rather than individual bi-alleiic mari<ers. Any 

22 individual marker allele is assigned a two-dimensional location consisting of the chromosomal location 

23 of the marker and the allele frequency of the marker allele. Two-dimensional chromosomal location- 

24 allele frequency spaces(or regions) are systematically covered by sets of covering alleles. Each 

25 individual covering allele is tested for association with a genetic charactenstic. Versions of inventions 

26 using systematic chromosomal location-aliele frequency(CL-AF) region coverings that are similar to 

27 versions of the invention in this application are possible. Indeed these types of inventions have been 

28 described in U.S. Provisional Patent Applications previously filed by the inventor. 

29 However, such a conceptual framework and the resulting inventions are equivalent to the CL-F versions 

30 approach used in this application. This is because any marker allele. A, that is used as a covering allele 

31 can be made to be an allele equivalent of a bi-allelic marker equivalent(BME). So that a BME with allele 

32 equivalents A and nonA is a bi-allelic marker with allele A. Therefore, any set of covenng alleles that 

33 systematically cover a two-dimensional CL-AF region is equivalent to a set of BMEs that systematically 

34 cover an equivalent CL-F region. Testing each covering allele for association with a genetic 

35 characteristic is exactly equivalent to testing each BME of a set of BMEs for evidence of linkage to a 

36 gene using a statistical linkage test based on allelic association. Even testing for the presence or 

37 absence of a covering allele in the chromosomal DNA of an individual is equivalent to genotyping the 

38 individual at a BME. And determining a sample allele frequency for a covering allele, is equivalent to 

39 determining the sample allele frequencies for a BME. 
40 
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1 

2 Example 1 of Process #1 is used for identifying markers linked to a disease gene. 

4 Example 1 A process for identifying bi-allelic markers linked to a bi-ailelic disease gene in human 

5 beings, comprising the steps of : 

6 

7 a)choosing two or more bi-allelic covering markers so that a CL-F region is N covered to within a CL-F 

8 distance [ 250,000 bp, 0.1 ] or the equivalent thereof by the covering markers, wherein N is an integer 

9 number greater than or equal to 2 ; 

10 

] 1 b)choosing the same statistical linkage test based on allelic association for each covering marker; 

12 

13 c)choosing the same sample of individual human beings for each covering marker; 

14 

1 5 d}obta!ning genotype data at each covering marker for each individual in the sample and obtaining 

16 phenotype status data for the disease for each individual in the sample ; 
17 

1 8 e)calculating evidence for linkage between each covering marker and the gene using the test chosen in 

19 step b) and the genotype data at each covering mariter and the using the phenotype status data for the 

20 disease for each individual in the sample ; and 

21 

22 f)identifying those covering markers as linked to the gene which show evidence for linkage based on 

23 the calculations of step e. 

24 Apparatus Versions 

25 

26 General step by step descriptio ns of individual apparatus versions are given below. 

27 

28 Apparatus #1, an apparatus to practice process #1. 

29 
30 

3 1 Apparatus #1, An apparatus for identifying bi-allelic markers linked to a bi-allelic genetic characteristic 

32 gene in a species of creatures, comprising . 

33 

34 a)means for choosing two or more bi-aflelic covering markers so that a CL-F region is systematically 

35 covered by the two or more covering markers; 

36 

37 b)means for choosing a statistical linkage test based on allelic association for each covenng mariner; 

38 

39 c)means for choosing a sample of individuals for each covering marker ; 
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1 

2 d)means for obtaining genotype data/sample allele frequency data for each covering marker and the 

3 sample chosen for each covering marker, and for obtaining phenotype status data for the genetic 

4 characteristic for each individual in the sample chosen for each covering marker; 

5 
6 

7 e)means for calculating evidence for linkage between each covering marker and the gene using the 

8 statistical linkage test based on allelic association chosen for each covering marker and the genotype 

9 data/sample allele frequency data for each covering marker and using the phenotype status data for the 

10 genetic characteristic for each individual in the sample chosen for each covering marker obtained in d); 

11 and 
12 

13 

14 f)means for identifying those covenng markers as linked to the gene which show evidence for linkage 

15 based on the calculations by means e). 

16 

1 7 More detailed description of Apparatus #1 : Apparatus #1 is an apparatus to practice 

18 process #1 More details of the description of apparatus #1 are found under the description of Process 

19 #1 above. Any one of the means labeled a), b). c). d), e) or f) of apparatus #1 includes any means for 

20 automating or partially automating a step as step a), b), c), d), e) or f) respectively of process #1 . An 

21 example of any one of the means in this paragraph labeled a), b), c), d). e). or f) is means comprising 

22 an appropriately programmed, suitable computer, the computer being supplied with proper data and 

23 instructions. 
24 

25 The means labeled d) of apparatus #1 for obtaining genotype data/ sample allele frequency data for 

26 each covering marker for the sample chosen for each covering marker includes any automated or 

27 partially automated means to obtain genotype data/ sample allele frequency data. An example of 

28 means to obtain genotype data/ sample allele frequency data is means using mass spectrometry.' 

29 Means to obtain genotype data/ sample allele frequency data that is automated or partially automated 

30 includes means comprising Oligonucleotide Technology described below. 

31 Apparatus #1A, an apparatus to practice process #1A. 

32 

33 Apparatus#1A : An apparatus for iocaiizing a bi-allelic genetic characteristic gene in a species 

34 of creatures to a chromosomal location-least common allele frequency (CL-F) region, 

35 comprising the means a), b), c), d) and e) of Apparatus #1 and further comprising the means of: 

36 f)means for localizing the gene to the approximate chromosomal location-least common allele 

37 frequency region (CL-F)of one or more maricers that show evidence for linkage based on the 

38 calculations of means e). 
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1 An example of means f) is means comprising an appropriately programmed, suitable cximputer, the 

2 computer being supplied with proper data and instructions. Further details of this apparatus which 

3 practices process #1 A are under process #1 and process #1A and Software (above). 

4 

5 Genotype data/Sample allele frequency data apparatus 

6 An apparatus to obtain genotype data/sample allele frequency data similar to the data of the step d) of 

7 process #1 has great utility in that it is used to provide genotype data /sample allele frequency data for 

8 the more powerful two-dimensional linkage studies introduced in this application. 

9 ApparatusGcl/Safd#1 : Genotype data/Sample allele frequency data apparatus: An apparatus for 

10 obtaining genotype data/sample allele frequency data for each bi-allelic marker of a group of 

1 1 two or more bi-allelic covering markers in the chromosomal DNA of one or more individuals of a 

12 sample, comprising: 

13 a) means for determining information on the presence or absence of each allele of each bi-allelic 

14 marker of a group of two or more bi-alielic covering markers in the chromosomal DNA of one or 

1 5 more individuals of the sample, a CL-F region being systematically covered by the two or more 

16 bi-allelic covering markers; and 

1 7 b) means for transforming the information of step a) into genotype data/sample allele frequency 

1 8 data for each marker of the group. 

19 The CL-F region and covering markers are for a species and the one or more individuals are members 

20 of the species. Means for determining information on the presence or absence of each allele of each bi- 

21 allelic marker of the group in chromosomal DNA includes any means of determination. Means for 

22 determining information on the presence or absence of each allele of each bi-allelic marker of the group 

23 in chromosomal DNA includes means comprising oligonucleotide technology by using a set of 

24 oligonucleotides that is complementary to the group as discussed below. Information on the presence 

25 or absence of each allele in the chromosomal DNA is obtained using a DNA specimen from each of one 

26 or more individuals of the sample or by using one or more DNA pools of DNA specimens from two or 

27 more individuals of the sample. Any apparatus that obtains genotype data or sample allele firequency 

28 data {similar to the data of the step d) of process #1 ) by determining the presence or absence of each 

29 allele of each bi-allelic marker of the group in the chromosomal DNA of one or more individuals is an 

30 example of this version of the invention. Versions of this apparatus also obtain a combination of 

3 1 genotype data and sample allele frequency data similar to the data of the step d) of process #1 . The 

32 details of step b) will be clear to those of ordinary skill in the art. 

34 Each bi-allelic covering marker is a true bi-allelic or BME. Determining the presence or absence of each 

35 allele of each bi-allelic marker in the group includes determining the presence or absence of each allele 

36 equivalent of each bi-allelic marker equivalent(BME) in the group. Any method of systematically 

37 covering the CL-F region is acceptable. In this application, the systematic covering of a CL-F region in 
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1 versions of the invention is described mathematically as the covering of a CL-F region, wherein the CL- 

2 F region is N covered to within a CL-F distance 6 by two or more bi-allelic covering markers. For further 

3 details regarding this, see Detailed Description of the Systematic Covering of a CL-F Region Used In 

4 Versions of the Invention above. 

5 An example of ApparatusGci/Safd#1 Genotype data/Sample allele frequency data apparatus, a 

6 sample allele frequency apparatus: 

^ Example 1 of ApparatusGd/Safd#1 :An apparatus for obtaining genotype data/sample allele frequency 

8 data for each bi-allelic marker of a group of two or more bi-allelic covering markers in the chromosomal 

9 DNA of one or more individuals of a sample, virtierein the genotype data/sample allele frequency data is 

10 sample allele frequency data, comphsing: 

1 1 a) means for determining infonmation on the presence or absence of each allele of each bi-allelic 

12 marker of a group of two or more bi-allelic covering markers in the chromosomal DNA from one or more 

1 3 individuals of the sample, a CL-F region being N covered to within the CL-F distance [ 1 .0 cM, 0. 1 5] by 

14 the two or more bi-alletic covering markers, wherein N is an integer number greater than or equal to 1 , 

15 and 

16 b) means for transforming the information of step a) into sample allele frequency data for each marker 

17 of the group. 

' ^ Example 2 of ApparatusGd/Safd#1 : An apparatus for obtaining genotype data/sample allele frequency 

19 data for each bi-allelic marker of a group of two or more bi-allelic covering markers in the chromosomal 

20 DNA of an individual, wherein the genotype data/sample allele frequency data is genotype data, 

21 comprising: 

22 a) means for detemriining information on the presence or absence of each allele of each bi-allelic 

23 mari<er of a group of two or more bi-allelic covering markers in the chromosomal DNA from an 

24 individual, a CL-F region being N covered to within the CL-F distance [1 2cM. 0.25] or the equivalent 

25 thereof by the two or more bi-allelic covering markers, wherein N is an integer number greater than or 

26 equal to 1 ; and 

27 b) means for transfomning the information of step a) into genotype data for each marker of the group. 

28 ( It should be noted that the following genotype apparatus is equivalent to Example 2 of 

29 ApparatusGd/Safd#1 : Genotype Apparatus: An apparatus for genotyping an individual, comprising: 

30 a)means to genotype an individual at two or more bi-allelic covering markers, a CL-F region being N 

3 1 covered to within the CL-F distance [1 2cM, 0.25] or the equivalent thereof by the two or more bi-allelic 

32 covering markers, wherein N is an integer number greater than or equal to 1 . ) 
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1 Genotype data/Sample allele frequency data process 

2 A process to obtain genotype data/sample allele frequency data similar to the data of the step d) of 

3 process #1 has great utility in that it is used to provide genotype data /sample allele frequency data for 

4 the more powerful two-dimensional linkage studies introduced in this application. 

5 Descriptio n of the Genotype data/Sample allele frequency data process. 

6 ProcessGd/Safd#1 : Genotype data/Sample allele frequency data process: A process for 

7 obtaining genotype data/sample allele frequency data for each bi-ailelic marker of a group of 

8 two or more bi-allelic covering markers in the chromosomal DNA of one or more individuals of a 

9 sample, comprising: 

10 a) determining information on the presence or absence of each allele of each bi-allelic marker of 

1 1 a group of two or more bi-alleiic covering markers in the chromosomal DNA of one or more 

12 individuals of the sample, a CL-F region being systematically covered by the two or more bi- 

13 allelic covering markers; and 

14 b) transforming the information of step a) into genotype data/sample allele frequency data for 

1 5 each marker of the group. 

16 The CL-F region and covering markers are for a species and the one or more individuals are members 

17 of the species. Determining information on the presence or absence of each allele of each bi-alleiic 

18 marker of the group in chromosomal DNA includes any method of determination. Determining 

19 information on the presence or absence of each allele of each bi-alleiic marker of the group in 

20 chromosomal DNA includes methods comprising oligonucleotide technology by using a set of 

21 oligonucleotides that is complementary to the group as discussed below. Information on the presence 

22 or absence of each allele in the chromosomal DNA is obtained using a DNA specimen from each of one 

23 or more individuals of the sample or by using one or more DNA pools of DNA specimens from two or 

24 more individuals of the sample. Any process that obtains genotype data or sample allele frequency data 

25 (similar to the data of the step d) of process #1 ) by determining the presence or absence of each allele 

26 of each bi-allelic marker of the group in the chromosomal DNA of one or more individuals is an example 

27 of this version of the invention. Versions of this process also obtain a combination of genotype data and 

28 sample allele frequency data similar to the data of the step d) of process #1 . The details of step b) will 

29 be clear to those of ordinary skill in the art. 
30 

3 1 Each bi-allelic covering marker is a true bi-allelic or BME. Determining the presence or absence of each 

32 allele of each bi-allelic marker in the group includes determining the presence or absence of each allele 

33 equivalent of each bi-allelic marker equivalent(BME) in the group. Any method of systematically 

34 covering the CL-F region is acceptable. In this application, the systematic covenng of a CL-F region in 

35 versions of the invention is described mathematically as the covenng of a CL-F region, wherein the CL- 

36 F region is N covered to within a CL-F distance 5 by two or more bi-allelic covering markers. For further 
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details regarding this, see Detailed Description of the Systematic Covering of a CL-F Region Used In 
Versions of the Invention above. 

An example of ProcessGcf/Safd#1 Genotype datg/Sample allele frequency data process, a 
genotype data process: 

Example 1 of Process<3d/Safd#1 :A process for obtaining genotype data/sample allele frequency data 
for each bi-allenc marker of a group of two or more bi-allejic covering markers in the chromosomal DNA 
of an individual, wherein the genotype data/sample allele frequency data is genotype data, comprising; 

a) detemiining infonnation on the presence or absence of each allele of each bi-allelic 
marker of a group of two or more bi-allelic covering markers in the chromosomal DNA from an 
individual, a CL-F region being N covered to within the CL-F distance [12cM, 0.25] or the equivalent 
thereof by the two or more bi-allelic covering mariners; wherein N is an integer number greater than or 
equal to 1; and 

b) transforming the information of step a) into genotype data for each maricer of the group. 

( It should be noted that the following genotype process is equivalent to Example 1 of 
ProcessGd/Safd#1: Genotype Process: A process for genotyping an individual, comprising: 
a) genotyping an individual at two or more bi-allelic covering markers, a CL-F region being N 
covered to within the CL-F distance [^2cM, 0.25] or the equivalent thereof by the two or more bi-allelic 
covering maricers, wherein N is an integer number greater than or equal to 1 . ) 

Oligonucleotide technology 

Each version of oligonucleotide technology is a means to sense the presence or absence of each of 
one or more true alleles of a group of true alleles in chromosomal DNA from one or more individuais by 
means of a hybridization reaction with an oligonucleotide that is complementary to each of the one or 
more tme alleles (see definitions section). Thus versions of oligonucleotide technology are a means of 
genotyping one or more individuals. And. versions of oligonucleotide technology are a means of 
obtaining sample allele frequency data for one or more marker alleles for a sample of individuals using 
pooled DNA fnam the individuals in the sample. 

In Some Vereions of Oligonudeotide Technology for Genotypin g or Obtaining Sample Allele Freg MPHny 
Data, a PhYSico-che mical Sig nal is G ene rated when an Allele in Chro mosomal niMA anH a 
Complementary Oligon ucleotide Hybridise 

Some versions of oligonucleotide technology for genotyping or for obtaining sample allele frequency 
data use a sensor which includes one or more oligonucleotides which are complementary to an allele. 
When the sensor is exposed to chromosomal DNA from an individual who carries the allele the 
Oligonucleotides which are complementary to the allele hybridize with chromosomal DNA specimens of 
the allele. The hybridizaUon generates a physico-chemical signal which indicates the presence of the 
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1 allele in the chromosomal DMA of the individtial. The lack of the physico-chemical signal indicates no 

2 (or negIigible)hybridization and that the allele is not present in the chromosomal DMA of an individual. 

3 Examples of oligonudeotide technology for Qenotyping. obtaininq sample allele frequency data or 

4 oenotvpe data/samole allele freguencv data 

5 Companies like Aflymetrix are using high density arrays of oligonucleotides attached to silicon chips or 

6 glass slides to genotype DNA from one individual at thousands of bi-allalic markere.' In some of these 

7 versions of oligonucleotide technology, the strength of hybridization of oligonucleotides that differ at 

8 only one base to DNA containing an SNPare compared to determine genotype." Another version of 

9 oligonudeotide technology uses oligonucleotides as PGR (Pol^erase Chain Reaction) primers to 
1 0 obtain genotype data."'other examples of oHgonudeotide technoJogy and it's uses to obtain genetic 

"1 1 information are induded in the artides cited in the endnotes,'^ Versions of oligonudeotide technology 

12 obtain sample allele frequency data firom pooled DNA or genotype data using oligonudeotides as PGR 

13 primers to obtain amplified reaction products that are detected by mass spedrometry. Another example 

14 of oligonudeotide technology is padlock probes.* 

15 Other examples of oligonudeotide technology are minisequendng on DNA arrays, dynamic allele- 

16 spedfic hydridization, microplate array diagonal gel electrophoresis, pyrosequendng. oligonudeotide- 
g 1 7 speafic figation, the TaqMan system and immobilized padlock probes as presented at the First 

g 1 8 International Meeting on Single Nucleotide Polymorphism and Complex Genome Analysis.** 

^"•^ Sets of Oligonudeotides for Genotvpino at Bi-allelic Markers or Obtaining Sample Allele FreauBrtcv 

\ 20 Data 

A set ofoHgonucleotides that is complementaryfsee deffnibonsj to a group of one or more bi-aJlelic 

22 markers has utility to determine genotype data at each of the markers in the group, induding groups 
with BMEs and approximately bi-allelic markers, 

24 Similarly, a set of oligonudeotides that is complementary to a group of bi-allelic markers has utility to 

25 obtain sample allele frequency data for each allele of each marker in the group. 

26 In both cases, obf^ining genotype data or sample allele frequency data, the same principle is 

27 used: a set of oligonucleotides that is complementary to a group of bi-alleJic markers has utiUty 

28 to determine the presence or absence of each allele of each marker in the group in 

29 chromosomal. DNA. 

30 Using sets of orTqonucleotides ta ob ta in Genotype DataySampte Allele FrHauencv Data for each 
marker of a group of bi^llelic marker s , wherein the group of Tnarkers svstHmaticallv cover a ci - 

32 F region 

33 Genotype data/sample allele frequency data for each marker of a group of bi-allelic marfcers. wherein 

34 the group of bi-alleJic markers systemically cover a CL-F region has great utility for use in the more 

35 powerful two-dimensional linkage studies introduced in this application. As described above under 

36 Oligonudeotide Technology, some sets of oligonudeotides have utility to detemiine genotype data at 

37 each bi-allelic marker of a group of one or more bi-allellc markers. Similarly, some sets of 

38 oligonudeotides have utility to obtain sample allele frequency data for each bi-allelic marker of a group 

39 of one or more bi-allelic markers. TUerefore, the use of one or more copies of a set of oligonudeotides 

40 to obtain genotype data or sample allele frequency data for each bi-allelic marker of a group of one or 
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1 more bi-allelic covering markers, wherein the group of bi-aHelic covering markers systematically cover a 

2 CL-F region has great utility. 

3 A word to avoid confusion in tenninol oqv: in this application, a set of markers for use in aenotvpinQ is 

4 referred to as a set of oligonucleotides. 

5 A set of oligonucleotides consisting of one or both strands of each allele of a group of one or more 

6 markers is a set of oligonucleotides that is complementary to the group of markers, (see definitions 

7 section) Such a set of oligonucleotides is in effect the group of markers themselves; and such a set of 

8 oligonucleotides has utility to determine genotype data at each marker in the group. So a group of 

9 markers (or set of markers) for use in obtaining genotype data or sample allele frequency data for each 

1 0 of the markers in the group is included in the descriptive phrase: "a set of oligonucleotides". 

1 1 Description of Use set#1 D: 

12 Use set#1 D The use of one or more copies of a set of oligonucleotides to detennine genotype 

1 3 data/sample allele frequency data for each bi-allelic marker of a group of two or more bi-allelic 

14 covering markers for one or more individuals, wherein the group of covering markers 

1 5 systematically cover a CL-F region. 

16 The CL-F region and covering markers are for a species and the one or more individuals are members 

17 of the species. An example of a set of oligonucleotides with utility to be used to determine genotype 

1 8 data/sample allele frequency data for each bi-allelic marker of a group of two or more bi-allelic covering 

19 markers is a set of oligonucleotides that is complementary to the group of markers. A set that is 

20 complementary to the group of markers is used to detect the presence or absence of each the alleles of 

21 the covering mari<ers by means of a hybridization reaction as discussed under oligonucleotide 

22 technology. Thus a set that is complementary to the group of markers is used to determine genotype 

23 data/sample allele frequency data for each covering marker 

24 The use of one or more copies of a set of oligonucleotides to obtain genotype data or sample allele 

25 frequency data for each bi-allelic marker of a group of one or more bi-alleiic covenng markers, wherein 

26 the group of bi-allelic covering markers systematically cover a CL-F region are both examples of this 

27 version of the invention(Use Set#1 D). 

28 In this application, the systematic covering of a CL-F region in versions of the invention is described 

29 mathematically as the covering of a CL-F region, wherein the CL-F region is N covered to within a CL-F 

30 distance 5 by two or more bi-allelic covering markers. For further details regarding this, see Detailed 
Description of the Svstematic Cov ering of a CL-F Region Used In Versions of the Invention above. 

32 Example IS of Use set#1D: The use in genotyping one or more individuals, of one or more 

33 copies of a set of oligonucleotides, the set of oligonucleotides being complementary to a group of two or 

34 more bi-alleiic covering markers, a CL-F region being N covered by the covering markers to within a 

35 CL-F distance of about [ 250,000 bp, 0. 1 ] or the equivalent thereof, wherein N is an integer greater than 

36 or equal to two. 

37 Composition of matter: Description of Comp set#1D: 

38 

39 Comp set#1 D: One or more copies of a set of oligonucleotides, the set of oligonucleotides being 
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1 complementary to a group of two or more bi-allelic covering markers, wherein the group of 

2 covering markers systematically cover a CL-F region. 

3 A set of oligonucleotides that is compiementary to a group of two or more bi-alieiic covering markers, 

4 wherein the group of covering markers systematically cover a CL-F region has great utility for use in the 

5 two-dimensional linkage study techniques introduced in this application. Such a set has utility in being 

6 used to genotype individuals or obtain sample allele frequency data or genotype data/sample allele 

7 frequency data as described above under Use set#1 D. In this application, the systematic covering of a 

8 CL-F region in versions of the invention is described mathematically as the covering of a CL-F region, 

9 wherein the CL-F region is N covered to within a CL-F distance 6 by two or more bi-allelic covering 

10 markers. For further details regarding this, see Detailed Description of the Systematic Covering of a CL- 

1 1 F Reoion Used In Versions of the Invention above. 
12 

13 Example 1Comp of Comp se^1D: 

14 Example 1Comp: 0ne or more copies of a set of oligonucleotides, the set of oligonucleotides being 

15 complementary to a group of two or more bi-allelic covenng markers, a CL-F region being N covered by 

16 the covering markers to within a CL-F distance of about [ 1cM, 0.2] or the equivalent thereof, wherein N 

17 is an integer greater than or equal to one. 

18 Redundancy of Covering Markers 

19 Some versions of the invention make use of N coverings of CL-F regions by covenng markers which 

20 limit (possibly to zero) the number of pairs of covering markers which are redundant within CL-F 

21 distance D, D = [Dcl Df], wherein D is less than or equal to about 6. a CL-F covenng distance. This 

22 limits the number covering markers which are separated by a CL-F distance of less than or equal to D(if 

23 the markers were placed on a CL-F map) which will be in extreme positive disequilibrium witti each 

24 other This limitation is done by requiring that less than or equal to R pairs of covering markers are 

25 redundant within distance D. Wherein R is an integer greater than or equal to 0 and less than or equal 

26 to about N(N-1 )/2. When R is chosen to be zero, no pair of covering markers is redundant within 

27 distance D. 

28 A preferable condition is that each bi-allelic covenng marker within each small CL-F region (a small 

29 segment-subrange of length about 5cl and width about 6f the distance components of the covering 

30 distance 5 ) provides much new (i.e. non-redundant) information about linkage and association to any 

31 nearby bi-allelic gene. Under these conditions, testing each bi-allelic covering marker in each small CL- 

32 F region increases the likelihood of detecting linkage to a gene. 

33 Limiting (including to zero) pairs of covering markers which are redundant within CL-F distance D(which 

34 is less than or equal to a covering distance 5 ) approaches and achieves this preferable condition. This 

35 limitation is not crucial to the functioning of a version of the invention, however, it has the advantage of 

36 reducing excess effort and increasing efficiency. 

37 Polymorphism CL-F Display 

38 Polymorphism CL-F display apparatus display the chromosomal location, least common allele 

39 frequency and identity of each polymorphism of one or more polymorphisms (markers or genes or both) 
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1 of one or more populations of one or more species on one or more two-dimensional graptis, each graph 

2 is similar to an x-y plot. The apparatus has utility including aiding in decisions regarding linkage studies 

3 and the interpretation of linkage study data. 

4 The apparatus comprise means to display the chromosomal location, least common allele frequency 

5 and identity of each polymorphism of one or more polymorphisms (markers or genes or both) of one or 

6 more populations of one or more species on one or more two-dimensional graphs, each graph is similar 

7 to an x-y plot. 

8 Each graph has two axes, one axis, the frequency axis, represents least common allele frequency and 

9 the altemate(or other) axis, the chromosomal location axis, represents chromosomal location. Each 

10 frequency axis of each graph is in units of population frequency. Each chromosomal location axis of 

1 1 each graph is in units of chromosomal location such as centimorgans, base pairs or the equivalent 

12 thereof. 

1 3 The frequency axis of each graph spans the entire range 0 to 0.5 or a subrange of the range 0 to 0.5. 

14 The chromosomal location axis of each graph spans the chromosomal locations on one or more 

1 5 segments of one or more chromosomes of a species, each of the one or more segments is a size from 

16 the equivalent of a base pair in length to the length of an entire chromosome (or the equivalent thereof). 

17 Each point on each graph is directly opposite a value on the frequency axis of each graph. The value 

1 8 on the frequency axis directly opposite each point on each graph is the frequency coordinate of each 

19 point on each graph. Each point on each graph is directly opposite a value on the chromosomal location 

20 axis of each graph. The value on the chromosomal location axis directly opposite each point on each 

21 graph is the chromosomal location coordinate of each point on each graph. 

22 Each graph displays the chromosomal location and least common allele frequency of each 

23 polymorphism of one or more polymorphisms. Each polymorphism displayed on each graph is assigned 

24 a graph location on each graph. 

25 The graph location of each polymorphism displayed on each graph is typical of the use of x-y plots. The 

26 graph location assigned to each polymorphism on each graph is a point. The diromosomal location 

27 coordinate of the point assigned as the graph location to any one polymorphism is equal (or 

28 approximately equal) to the chromosomal location of the polymorphism. And the frequency coordinate 

29 of the point assigned as the graph location to any one polymorphism is equal (or approximately equal) 

30 to the least common allele frequency of the polymorphism. 

3 1 The apparatus compnse means for displaying one or more two-dimensional graphs. Each graph 

32 comprises, the identity and graph location of one or more polymorphisms assigned a location on each 

33 graph. And the apparatus comprise means for displaying one or more graphs wherein the viewer 

34 chooses the species, population, polymorphisms, span of the frequency axis and span of the 

35 chromosomal location axis of the one or more graphs ; in versions of the apparatus, the means of this 

36 sentence comprises a computer. 

37 The apparatus compnse means for storing and updating data on the chromosomal location and least 

38 common allele frequency of one or more polymorphisms of one or more populations of one or more 

39 species and means for storing chromosomal location and least common allele frequency data on newly 

40 discovered polymorphisms. 
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' Versions of the apparatus comprise means for printing each of the one or more graphs. 
Theory of Operation / Best Mode 

3 Systematically Varvino Both Marker Chromosomal Location and Marker Allele Frequency of Markers in 

4 Linkage Studies 

5 The inventor's calculations and observations have demonstrated the increased power of the TDT in 

6 more common, less optimal situations when a bi-allelic marker and bi-allelic gene have (1 ) similar but 

7 not identical allele frequencies and (2) the marker and gene are in some degree of linkage 

8 disequilibrium. Thus, for a typical linkage study using bi- allelic markers and an association based 

9 linkage test, to increase the likelihood of both criteria (1) and (2) occurring for one or more 

1 0 markers, so as to increase the power of an association based linkage test in a linkage study, the 

1 1 bi-allelic markers used in the study are chosen so that the least common allele frequencies of 

12 the markers vary systematically over a range or subrange of least common allele frequency 

1 3 AND the chromosomal location of the markers vary systematically over one or mote 

1 4 chromosomes or chromosomal regions. And the bi-allelic markers are chosen so that the 

1 5 markers' chromosomal locations and least common allele frequencies vary systematically in an 

1 6 essentially independent manner. 

17 (In theTheory of Operation/ Best Mode Section the traditional symbol used in scientific papers for the 

18 disequilibrium coefficient, 6, is used. This should not be confused with the symbol 6 used for the 

19 covering distance in the remainder of the application. The symbol d is used for the disequilibrium 

20 coefficient in the sections of the application other than the Theory of Operation/Best Mode Section. ) 

21 The theory of operation is based on the mathematical observation that the TDT and other association- 

22 based tests for linkage are increased in power as the frequencies of the disease-causing allele of a bi- 

23 allelic gene and the positively associated allele of a linked bi-allelic marker become similar in 

24 magnitude. The inventor made this observation as a result of deriving the equation shown below for P, 

25 (this is Equation 2 in the unpublished manuscript submitted for publication in December 1 996 and in 



26 published paper by RE McGinnis in the Annals of Human Genetics vol 62, pp. 159-179, 1998). 

27 Pt = .5 + (1 - 29) \hl±If2^j^2^^y2^i^iE^^ 

2^ Equation 2 

29 

30 Pj may be regarded as the size of the "signal" which is given by the TDT to indicate that a tested 

31 marker is linked to a disease-causing gene. The more P, is elevated above 0.5 (baseline), the greater is 

32 the evidence for linkage or "power" provided by the association-based linkage test known as the TDT. 

33 Table 2 in the unpublished manuscript filed with previous US Provisional Patent 

34 Applications(see below) illustrates how signal strength increases substantially as the frequencies of 

35 disease-causing allele and positively associated marker allele become similar in magnitude. As noted 

36 on pages 24 and 25 of the unpublished manuscript(see below). Table 2 assumes that the frequency (p) 
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1 of the disease-causing allele is fixed at p=-1 whfle the frequency (m) of the positively associated marker 

2 allele varies (m=-S, .3. .2, .1. .05). Note that when the level of disequilibrium (or association) between 

1 

3 the bj-allelic marker and bi-alleliQ disease gene is fixed On this case either 8=Sfnax or ^max )> the 

4 signal strength of Pj progressively increases as m decreases from m=.5 to m=.l (the same frequency 

5 as the disease allele, i.e.. p=.1). For example, in the section of Table 2 for r=5, note that when 5=^ 

6 5max. Pt .548 at m=.5 and then steadily increases to .572 (m=3). .597 (m=.2), .648 (m=.1) and then 
I starts to decrease again as m departs from m=p=.1 C-e- Pt = 638 at m=.05). As noted on pages 24-25 

8 (below)of the unpublished manuscript the TDT chi-square statistic (assuming a sample size of 200 

9 families) is such that the signal strength at m=.5 (Pj =.548) does not produce a statistically significant 

1 0 evidence for linkage (p-value > D.05) while the doubling of signal strength at m=.2 (P^ =.597) produces 

1 1 very strong statistical evidence for linkage by the TDT (p-value< 0.005). This sort of substantial 

1 2 increase in power is aiso true of other association-based linkage tests as the frequencies of the 

13 disease-causing allele and associated marker allele become more similar In magnitude. 
14 
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1 Table 2(Footnotes for Table 2 are on next page) 

2 Effect of penetrance ratio (r), disequilibrium (5) and marker heterozygosity (m) on magnitude 

3 ofPtandPs 

4 Magnitude of Ft Magnitude of Ps 



5 








9 ^max 


■b 6-0 






5=0 


6 
7 


r=2 


m=.5 


.526 


.513 


.500 


.505 


.505 


.504 


8 




m=.3 


.541 


.521 


.500 


.508 


.506 


.504 


9 




m=2 


.558 


.531 


.500 


.511 


.508 


.504 


10 




m= 1 


.595 


.555 


.500 


.518 


.512 


.504 


11 




m=.05 


.589 


.552 


.500 


.517 


.51 1 


.504 


12 


















13 


r=5 


m=.5 


.596 


.548 


.500 


.543 


.540 


.539 


14 




m-3 


.633 


.572 


.500 


.561 


.548 


.539 


15 




m=.2 


.666 


.597 


.500 


.575 


.556 


.539 


16 




m=.] 


.719 


.648 


.500 


.600 


.573 


.539 


17 




m=05 


.696 


.636 


.500 


.589 


.571 


.539 


18 

19 


r=10 


m=5 


.656 


.577 


.500 


.595 


.587 


.584 


20 




m=.3 


.702 


.612 


.500 


.623 


,600 


.584 


21 




m-.2 


.736 


.644 


.500 


.644 


.612 


.584 


22 




m= 1 


.785 


.703 


.500 


.673 


.635 


.584 


23 




m=05 


.750 


.684 


.500 


.652 


.628 


.584 


24 


















25 


r=oc 


m-.5 


.740 


.617 


.500 


.700 


.680 


.673 


26 




m=3 


.791 


.663 


.500 


.743 


.700 


.673 


27 




m=.2 


.826 


.703 


.500 


.772 


.716 


.673 


28 




m= 1 


.870 


.770 


.500 


.807 


.744 


.673 


29 




m-05 


.816 


.741 


.500 


.763 


.730 


.673 
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Footnotes for Table 2 



2 a-^'Value of 5 that is maximal (Smax) and half-maximal (-^ Smax). as determined by the 



3 heterozygosity of the marker (m) and disease locus (p=. 1 ) 
4 



5 Importance of disequilibrium and marker heterozygosity (i.e. marker allele frequency) in 

6 detecting linkage 



8 disease locus are fixed, (Ps - .5) and |Pt - .5] are both maximized at the most positive or most 

9 negative possible value of 5 (6nia.x, Smin), as demonstrated in the published paper. This 

10 maximization of x^asp x^tdt is intimately connected to Ms and Mt (defined in equations 1 

1 ] and 2) since: (a) these are the only two factors in Ps and Pt that are influenced by 6 and (b) Ms 

12 and |Mt| are maximal and equal to each other when 5 is extreme (6max or 6min)- Furthermore, as 

13 explained in the published paper, Ms is a measure of the proportion of informative (A/B) 

14 parents who are also informative (D/d) at the disease locus. Therefore, maximizing Ms (and, 

15 by implication, |Mt|) is equivalent to minimirmg the proportion of A/B parents who are 

16 homozygous (D/D or d/d) at the disease locus. Such homozygous D/D or d/d parents 

17 contribute evidence against linkage since they transmit marker alleles A and B to affected 

18 offspring with equal probability; thus, minimizing their proportion among A/B parents being 

19 tested for linkage has the effect of maximizing x^asp and x-^tdt 

20 Nevertheless, when bi-allelic markers have a specific (i.e. fixed) heterozygosity 

21 different from that of a bi-allelic disease locus, some A/B parents must be homozygous at the 

22 disease locus, even when 5 is extreme. But if marker heterozygosity is variable, the proportion 

23 of A/B parents who are D/D or d/d approaches zero as marker heterozygosity approaches that 

24 of the disease locus and as 6 approaches 6ma.x or 6n,m Consequently, the most extreme values 

25 of Pt and Ps, and highest values of x-^tdt and x^asp are found when marker and disease locus 

26 have equal heterozygosity and S=5nja\ or S^Smm. 
27 

28 Example illustrating the importance of marker heterozygosity (i.e. allele frequency) 



7 



When the heterozygosity (i.e. allele frequencies) of a bi-allelic marker and bi-allelic 
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To illustrate the importance of marker heterozygosity and disequilibrium. Table 2 
shows Pt and Ps values when the frequency (p) of disease allele D is constant at 0. 1 , but the 
frequency (m) of marker allele A varies between m=.5 (maximum marker heterozygosity) and 
m=. 1 (equal heterozygosity at marker and disease loci). The table assumes mode of 
inheritance is additive, and separate sections of the table show the results when the penetrance 
ratio (r) is 2, 5, 10 or oc. For each value of r, an individual line in the table represents constant 
marker heterozygosity (m=.5, .3, .2, or . 1 ) and from left-to-right on each line, one sees Pt and 
Ps values when 5=6max, Smax, and 6=0, the value of Smax being determined by the particular 

values of m and p [6max=p(l-m)]. As noted in Appendix I of the published paper, when p<m 
and p<(l-m), as in this example, the most extreme values of Ptand Ps must occur at 6==5max 
This can be seen in each line of the table by the steady increase in both Pt and Ps as one moves 
from 5=0 to 6=5niax, with every line also showing Pt > Ps at 6=6max and most lines showing Pt > 

Ps at 6— Smax- 

Most remarkable, however, are the sizeable increases in Ps and even greater increases 
in Pt as marker heterozygosity drops toward the heterozygosity of the disease locus (m->. 1 ). A 
typical example is at r=5 and 6=^ Sniax where the table shows Pt=.548 at maximum marker 
heterozygosity (m= 5) and Pt = 597 or .648 for m=.2 or . ] , respectively. The impact of such an 
increase in Ptcan be understood by calculating x^tdt fo^" Pt=-548 (m= 5) and for Pt=.597 (m=.2) 
assuming a data set of 200 families each with two affected sibs. Based on the expression for ^ 



20 , 1 calculate the proportion of A/B parents to be .50 and .39 when m= 5 and .2, respectively. 

2 ] So for m=.5, there would be .5 x 400 x 2 = 400 informative transmissions to affected offspring 

382 

22 With transmissions of allele A totaling .548 x 400 = 219, thus implying x'tdt=^ = 3 .61, 

23 p<0. 1. For m=.2, there would be .39 x 400 x 2 = 3 12 informative transmissions of which .597 x 

602 

24 3 12 = 186 would be transmissions of allele A yielding ^ p<0.005. 

25 This example is typical, and highlights perhaps the most important finding of this 

26 paper; namely the importance of using bi-allelic markers with heterozygosity similar to that of 

27 a bi-allelic disease locus. Indeed, since a majority of susceptibility loci may be bi-allelic, the 



SUBSTITUTE SHEET (RULE 26) 



wo 99/43858 PCT/US99/04376 

43 

judicious use of bi-allelic markers of both high, medium, and low heterozygosity may be 
crucial in order to initially detect and replicate linkages to loci conferring modest disease risk. 
Best Mode: 

Method for locating disease causing polymorphism using biallelic linkage 
analysis 

Objective ;To test, by association-based linkage analysis (e.g., by TDT), whether a 

disease-causing polymorphism is located on a particular chromosome (e.g., human 
chromosome 4) or within a particular subregion of that chromosome. 



14 PART 1 - Steps in conducting the association-based linkage test 
15 

16 Step 1 

17 To conduct the test, first divide the chromosome or subregion of interest into segments 

18 that are short enough that polymorphisms within each segment are likely to be in linkage 

19 disequilibrium with each other. The division of a chromosome or subregion of interest into 

20 "segments" is conceptual {not physical) and is based on chromosomal maps such as those 

21 provided by the Whitehead Institute or Marshfield Foundation for Biomedical Research. 

22 Although disequilibrium has been observed in Finnish populations between polymorphisms 

23 that are 7 to 10 centimorgans (cM) apart, the chromosomal segments for searching for disease- 

24 causing polymorphisms in more genetically heterogeneous populations should be less than 1 

25 cM long (e.g., 250,000 base pairs long). These chromosomal segments might or might not 

26 overlap each other (i.e., share some of their length in common); but the set of chromosomal 

27 segments should completely cover the entire chromosome or entire subregion of interest, so 

28 that a disease-causing polymorphism located anywhere on the chromosome or anywhere in the 

29 subregion of interest will be detected by the test. 
30 

31 Step 2 

32 It is well known that increased disequilibrium between a marker and linked disease 

33 locus increases evidence for linkage provided by association-based linkage tests such as the 

34 TDT. However, what has not been recognized is that the specific allele frequencies of the 

35 marker locus can also have an enormous impact on the strength of evidence for linkage. I 
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1 showed this by analyzing equation 2 for Pt. Specifically, when a bi-allelic marker is in linkage 

2 disequilibrium with a bi-allelic disease locus, the strength of evidence for linkage provided by 

3 the TDT is greatly increased if the bi-allelic marker and bi-allelic disease locus have similar 

4 allele frequencies. 

5 This phenomenon is illustrated by Table 2 and explained above. For example, suppose 

6 as noted above, that the susceptibility allele ("allele D") of a bi-allelic disease locus has a 

7 frequency of 0. 1 and further suppose that the disease locus is in half-maximal positive 

8 disequilibrium with a bi-allelic marker (6=45max). As noted above, X^tdt will equal only 3.61 

9 (p< 0. 1 ) if the frequency of the less common marker allele is 0.5; but if the frequency of the 

10 less common marker allele is 0.2 (and hence much closer to the frequency of allele D) then 

1 1 X-XdX will equal 1 1.54, thus providing much stronger evidence for linkage (p<0.005). 

12 Therefore, in searching for association-based linkage to a bi-allelic disease locus within 

13 each of the aforementioned chromosomal segments (see step 1 ), it is crucial to identify and test 

14 (e.g., by TDT) bi-allelic markers within each segment that have a broad range of allele 

1 5 frequencies. An unidentified bi-allelic disease locus could have allele frequencies close to 

16 0.5/0.5, 0.4/0.6, 0.3/0.7, 0.2/0.8, 0. 1/0.9 or below 0. 1/above 0.9; hence, it is crucial to test bi- 

17 allelic markers with frequencies near 0.5/0.5 and near 0. 1/0.9 as well as test others with allele 

18 frequencies that fall at regular increments between the extremes of 0.5/0.5 and 0. 1/0.9. By 

19 testing bi-allelic markers with a broad range of allele frequencies that are spaced at regular 

20 mtervals between 0.5/0.5 and 0. 1/0.9. one is assured of testmg some bi-allelic markers whose 

21 two allele frequencies are reasonably close to the allele frequencies of an unknown bi-allelic 

22 disease locus. 

23 Thus, for step 2, within each chromosomal segment, subsets of bi-allelic markers 

24 should be identified. Each subset contains only bi-allelic markers having approximately the 

25 same allele frequencies. For example, subset A contains only markers whose less common 

26 allele has a population frequency of about 0. 1 . Similarly, subsets B, C, D, and E contain only 

27 bi-allelic markers whose less common allele has a frequency of approximately 0.2, 0.3, 0.4, 

28 and 0.5, respectively. In other versions of the invention the number of subsets is greater or less 

29 than five, and the approximate allele frequency of the less common bi-allele of subsets is other 

30 than about 0. 1,0.2, 0.3, 0.4 or 0.5 and is expected to be more than one decimal long since 

31 allele frequencies from real populations are rarely round numbers. However, the crucial point 

32 is that each subset should contain only bi-allelic markers belonging to one chromosomal 

33 segment and the frequency of the less common allele of each subset member should be 
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1 approximately the same (i.e., the difference between the frequencies of the less common allele 

2 of any two subset members should not exceed 0.15). Also crucial, as I emphasized above, is 

3 that the group of subsets for each chromosomal segment represent frequencies near the 

4 extremes of 0.5/0.5 and 0. 1/0.9 as well as represent bi-allele frequencies between these two 

5 extremes that are approximately evenly spaced as illustrated by the group of subsets referred to 

6 above as A, B, C, D and E. 
7 

8 Step 3 

9 In step 2, 1 described the importance of testing subsets of bi-allelic markers having 

1 0 approximately the same frequencies for their two alleles. Here I further delineate the 

1 1 characteristics of the markers that should be chosen for each subset by noting why it is 

12 important that each subset contain more than one bi-allelic marker. Even though a particular 

13 bi-allelic marker has allele frequencies that are similar to those of a closely linked bi-allelic 

14 disease locus, the marker may not be in strong positive disequilibrium with the disease locus. 

15 If disequilibrium is minimal, the marker will not show strong evidence for linkage under the 

1 6 TDT or any other association-based linkage test, even though the bi-allelic marker and disease 

1 7 locus have similar allele frequencies. 

18 Hence, it is important that each subset contain multiple bi-allelic markers so that there 

19 is increased likelihood that at least one of the markers will be in reasonably strong 

20 disequilibrium with a closely linked bi-allelic disease locus. Beyond the cardinal criterion that 

21 all bi-allelic markers in a subset have similar allele frequencies, an additional criteria for 

22 selecting markers to belong to a subset is that the chosen bi-allelic markers should not be in 

23 extreme positive disequilibrium with each other. 

24 The reason for this is as follows: According to standard usage, the disequilibrium 

25 coefficient (6) is defined by the equation 6=f{AB) - f(A)f(B) where f(A) and f(B) may be 

26 defined as the frequencies of the less common allele (denoted A and B) of two bi-allelic loci 

27 belonging to the same subset and f(AB) is the population frequency of the AB haplotype. 

28 Since the two markers belong to the same subset, we may assume that f(A)=f(B)=q; hence the 

29 maximum positive value of 6 (denoted Smax) is 5=q-q2. This maximum positive 6 value (i.e. 

30 maximum "positive disequilibrium") occurs if every chromosome that carries allele A also 

3 1 carries allele B, and if every chromosome that carries allele not-A also carries allele not-B. 

32 Hence, when two bi-allelic markers with similar allele frequencies are in extreme postive 

33 disequilibrium with each other (i.e., 5 is approximately equal to Smax), the two loci provide 
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1 the nearly identical information with respect to their linkage and association with a third 

2 polymorphism such as a disease locus. Hence one of the two bi-allelic markers would provide 

3 no additiorml information and its inclusion in the subset would not increase the likelihood of 

4 detecting linkage and association to a nearby disease locus. 

5 Therefore, bi-allelic markers belonging to the same chromosomal segment and subset 

6 should not only have similar allele frequencies, the 6 value between each pair of bi-allelic 

7 markers in the same subset should be substantially less than Smax" Q-Q"^ Thxs assures that 

8 every bi-allelic polymorphism belonging to the subset provides much new (i.e. non-redundant) 

9 information about linkage and association to any nearby bi-allelic disease locus: thus testing 

10 each bi-allelic marker in the subset would increase the likelihood of detecting linkage to a 

1 1 disease locus. 

12 

13 SteD4: Test for linkage 

14 To test for (association-based) linkage to a bi-allelic disease locus, each bi-allelic 

15 marker in each subset from each chromosomal segment is tested individually by using the 

16 TDT, AFBAC method or other family-based linkage test. To conduct these tests for a 

17 particular marker, members of nuclear families (most especially parents, and any children who 

18 manifest disease) are genotyped at the marker being tested and the genotypes are then 

19 evaluated according to the TDT, AFBAC method or other family-based linkage/association test 

20 (for description of TDT and AFBAC, see Spielman et al. Am J of Human Genetics 52:506-516 

21 (1993) and Thomson, Am J Human Genetics 57:487-498 (1995)). Alternatively, linkage and 

22 association is tested for each marker in each subset from each segment by genotyping 

23 individuals with disease and related or unrelated normal controls at each marker to be tested. 

24 (End of best mode example) 

25 Further Information 

26 (Step 3 is not essential for the operation or utility of this version of the invention. In this best 

27 mode example, the least common allele frequency subrange 0. 1 to 0.5 is used. In versions of 

28 the invention similar to the best mode, versions of the invention are operable and have utility 

29 for any subrange of the least common allele frequency range 0 to 0.5. In addition, rather than 

30 genotyping DNA from single individuals in step 4, in some versions of the invention each 

31 marker in each subset from each segment is tested for association with disease by evaluating 

32 DNA from pooled samples.) 

33 
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1 PART 2 - Physical implementation of the above test 

2 

3 Silicon chips or glass slides with arrays of oligonucleotides for testing bi-allelic markers 

4 Companies like Affymetrix"" are using silicon chips or glass slides to genotype DNA 

5 from one individual at thousands of bi-allelic markers. Each silicon chip or glass slide is 

6 divided into a grid or 2-dimensional matrix that contains thousands of cells. To the surface of 

7 each cell is attached multiple copies of a unique oligonucleotide whose sequence is 

8 complementary (type ( 1 ))to one of the two alleles of a particular bi-allelic marker. Thus, DNA 

9 from an individual who carries the allele hybridizes to the cell with substantially greater 

10 affinity than does the alternate bi-allele. The degree of hybridization generates a signal and 

1 1 enables the genotype of the individual to be inferred for that particular bi-allelic polymorphism 

12 [i.e., the individual is homozygous (++), heterozygous (+-), or homozygous (--)]. In some 

13 applications, it is crucial to attach oligonucleotides corresponding to each allele of a bi-allelic 

14 polymorphism in adjacent cells so that the relative (i.e. local) intensity of hybridization in the 

15 adjacent cells can be compared, thus facilitating inference of the individual's correct genotype 

16 (++,+-, or-). 

' 7 In using this silicon chip or glass slide technology to test for linkage and association, 

18 the ideas detailed in PART 1 indicate how the oligonucleotides that are attached to the cells of 

19 the silicon chip or glass slide should be chosen. To scan a particular chromosome or 

20 chromosomal region for a bi-allelic disease locus, the chromosome or chromosomal region 

21 should be subdivided into segments as described in Step 1 above. For each segment, subsets of 

22 bi-allelic markers having the properties detailed in PART 1 above should be identified. The 

23 DNA of select individuals (see "Test for linkage" - above) should then be assayed at each bi- 

24 allelic marker in every chromosomal segment and in every subset of markers belonging to the 

25 segment. This would be accomplished by attaching an oligonucleotide corresponding to one of 

26 the marker's two alleles to a particular (i.e. known) cell on the silicon chip or slide. To 

27 enhance assignment of accurate genotypes, it may also be advisable to attach an 

28 oligonucleotide corresponding to the second allele of the bi-allelic marker in an adjacent cell as 

29 mentioned in the previous paragraph. 

30 Industrial Applicability 

3 1 Versions of the present invention are useful for locating trait causing genes and polymorphisms such as 

32 human disease genes and polymorphisms. Versions of the invention could be used to find the cure for 

33 human disease. The making and use of versions of the invention should be clear to a person of skill in 

34 the art after reading the descnption 

35 Scope of the Invention 
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1 While the description contains many specificities, these should not be construed as limitations on the 

2 scope of the invention, but rather as exemplifications of versions of the invention. 

3 

4 Accordingly the scope of the invention should be determined not by the specific versions described 

5 alone, but also by the claims and their legal equivalents and also by any future claims drawn to the 

6 invention and future descriptions of versions of the invention. 

7 

8 Notes: 

9 The reader's attention is directed to the following papers which are open to the public and are herein 

10 incorporated by reference: (1) McGinnis, Ewens & Spielman, Genetic Epidemiology 1995 ; 12(6) 637- 

1 1 40. (2) RE McGinnis Annals of Human Genetics vol 62, pp. 1 59-1 79, 1 998. The papers in the endnotes 

12 below are incorporated herein by reference. 
13 

' Weighing DNA for Fast Genetic Diagnosis. Science. March 27, 1998. vol. 279, pp. 2044- 
2045. 

" Spielman, R.S. and Ewens, W.J. The TDT and Other Family-Based Tests for Linkage 
Disequilibrium and Association . American Journal of Human Genetics, 59: 983-989, 1996. 

"Mathematical Theory of Optimization" The New Encyclopedia Britannica , 15* edition, vol. 
25, pp. 217-221. 

American Journal of Human Genetics, yol. 57: 439-454, 1995. 

American Journal of Human Genetics, vol. 58: 1347-1363, 1996. 
^' Human Heredity, vol. 44, pp. 225-237, 1994. 
^" Human Heredity, vol. 46, pp. 226-235, 1996. 

Accessing Genetic Information with High-Density DNA Arrays . Mark Chee. et al. 
Science, vol 274. Oct. 25, 1996 , pp. 610 - 614. 

^ Large Scale Identification, Mapping, and Genotypmg of Stngle-Nucleotide Polymorphisms 
in the Human Genome. Wang, et. al.. Science. May 15, 1998, vol 280, pp. 1077-1081. 
^ (l)Schuster, H. et al (1995) Nature Genetics, 13(1) : 98 - 100. 
(2)Gyapay, G. et al (1994) Nature Genetics, 7: 246-339. 

Some versions of oligonucleotide technology and it's uses to obtain genetic information are 
included in the following papers; 

( 1 ) Accessing Genetic Information with High-Density DNA Arrays . Mark Chee, et al. Science, 
vol 274, Oct. 25, 1996 , pp. 610 - 614. 

(2) Genetic analysis of amplified DNA with immobilized sequence- specific oligonucleotide 
probes. Saiki,et al. Proc Natl Acad Sci USA vol 86, pp.6230-6234. 

(3> Allele-specific enzymatic amplification of B-globin genomic DNA for diagnosis of sickle 
cell anemia . Wu, et al., Proc Natl Acad Sci USA vol 86 pp 2757-2760. 

(4 ) Automated DNA diagnostics using an Elisa-bascd oligonucleotide ligation assay . 
Nickerson, et al., Proc Natl Acad Sci USA vol 87, pp. 8923-8927. 

(5) Genetic analysis of amplified DNA with immobilized sequence specific oligonucleotide 
probes. Saiki. et al., Proc Natl Acad Sci USA vol 86 pp 6230 - 6234. 

" Padlock Probes:Circularizing Oligonucleotides for Localized DNA Detection . Science, 
Sept. 30, 1994, vol. 265, pp. 2085-2088. 

^" SNP attack on complex traits . Nature Genetics, Nov. 1998, vol. 20 no. 3, pp. 217-218. 
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Claims 



4 What is claimed 

5 1 . An invention substantially as described in the description 

6 2. An invention substantially as described and illustrated in the description. 

7 3. A process for identifying one or more bi-allelic markers linked to a bi-allelic genetic characteristic 

8 gene in a species of creatures, comprising the steps of : 

9 

10 a)choosing two or more bi-alielic covering markers so that a CL-F region is systematically covered by 

11 the two or more covering markers, the CL-F region being a collection of points on a two-dimensional 

12 plane, the two-dimensional plane having the two orthogonal dimensions of chromosomal location and 

13 least common allele frequency, 
14 

15 b)choostng a statistical linkage test based on allelic association for each covering marker; 
16 

17 c)choosing a sample of individuals for each covering marker , 
18 

19 d)obtaining genotype data/sample allele frequency data for each covering marker and the sample 

20 chosen for each covering marker, and obtaining phenotype status data for the genetic characteristic for 

21 each individual in the sample chosen for each covering marker, 

22 

23 e)calcuiating evidence for linkage between each covenng marker and the gene using the statistical 

24 linkage test based on allelic association chosen for each covering marker and the genotype 

25 data/sample allele frequency data for each covering marker and using the phenotype status data for the 

26 genetic characteristic for each individual in the sample chosen for each covering marker obtained in d); 

27 and 
28 

29 f)identifying those covering markers as linked to the genetic characteristic gene which show evidence 

30 for linkage based on the calculations of step e) 

31 4. A process as tn claim 3, wherein the CL-F region is N covered to within a CL-F distance 5 by the two 

32 or more bi-ailelic covering markers, so that each point in the region is within the CL-F distance 5 of N or 

33 more of the covering markers, wherein 6 is equal to about [ 5cl, 0.25] or the equivalent thereof, 6cl is 

34 equal to the largest chromosomal length, computed by any method, for which linkage disequilibrium has 
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1 been observed between any polymorphisms in any population of the species, N is an integer greater 

2 than or equal to 1 . 

3 5 A process as in claim 4, wherein the CL-F region includes 81 percent or more of the centerpoints of 

4 the matrix centerpoint lattice of a CL-F matrix, the number of cells in the matrix being greater than or 

5 equal to three, wherein the matrix has R rows and C columns, each cell of the matrix being of length Lmc 

6 and width Wmc, and Lmc being less than or equal to about 25cl. and Wmc being less than or equal to 0 5, 

7 6cL IS equal to the largest chromosomal length, computed by any method, for which linkage 

8 disequilibrium has been observed between any polymorphisms in any population of the species, there 

9 being N or more covering markers in each cell of 81 percent or more of the cells of the matrix, N is an 

10 integer greater than or equal to 1 ; the covenng markers being distnbuted over a chromosomal region of 

1 1 interest, the region of interest being approximately the smallest chromosome interval that contains all of 

12 the covering markers, and the covering markers comprising essentially less than all of the 

13 polymorphisms in the region of interest 

14 6. A process as in claim 5, wherein the covering markers are substantially nonevenly distributed across 

15 a chromosome or a chromosomal segment. 

16 7. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

17 - chromosome or a chromosomal segment, and wherein there is a subgroup of one or more of the 

18 covering markers, and each of the markers in the subgroup is chosen without substantial preference for 

19 the least common allele frequency of each of the markers in the subgroup being close to 0.5, and the 

20 number of covering markers in the subgroup is about 5 percent or more of the total number of covering 

21 markers. 

22 8. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

23 chromosome or a chromosomal segment, and wherein there is a subgroup of one or more of the 

24 covering markers, and each of the markers in the subgroup is chosen without substantial preference for 

25 the least common allele frequency of each of the markers in the subgroup being close to 0.5. 

26 9. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

27 chromosome or a chromosomal segment, wherein (1 ) the average chromosomal intermarker distance of 

28 the covering markers is greater than 2 cM or the equivalent thereof and the least common allele 

29 frequency of one or more of the covenng markers is less than 0.3, or wherein (2) the least common 

30 allele frequency of one or more of the covering markers is less than 0.2, or wherein (3) the average 

31 chromosomal intermarker distance of the covenng markers is greater than 1 0 cM or the equivalent 

32 thereof 

33 10. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

34 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

35 the covering markers is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional 
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1 probability the covering markers were chosen essentially randomly from substantially the known set of 

2 bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive is less 

3 than or equal to 1 0 percent; wherein the conditional probability is substantially conditional on (1 ) the 

4 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 

5 marker and (3) there being N or more covering markers in each eel! of 81 percent or more of the cells of 

6 the matrix. 

7 11 A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

8 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

9 the covering markers is greater than 2 cM or the equivalent thereof; and wherein the conditional 

10 probability the covering markers were chosen essentially randomly from substantially the known set of 

11 bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive is less 

12 than or equal to 10 percent; wherein the conditional probability is substantially conditional on (1) the 

13 approximate chromosomal distnbution of the covering markers, (2) the marker type of each covering 

14 marker and (3) there being N or more covering markers in each cell of 81 percent or more of the cells of 

15 the matrix 

16 1 2. A process as in claim 10, wherein the chromosome or the chromosomal segment consists 

17 essentially of a set of nonoverlapping chromosome segments of substantially equal length, and wherein 

18 each chromosome segment of the set has two or less covering markers located thereon, wherein one 

19 and only one covenng marker is located on each of 80 percent or more of the chromosome segments of 

20 the set. and wherein zero or two and only two covering markers are located on each of 20 percent or 

21 less of the chromosome segments of the set, and wherein each chromosome segment that borders a 

22 chromosome segment with zero covering markers located thereon has only one or two covering 

23 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

24 with two covering markers located thereon has only one or zero covering markers located thereon; and 

25 wherein the conditional probability the covering markers were chosen essentially randomly from 

26 substantially the known set of bi-allelic markers with least common allele frequencies between 0.2 

27 inclusive and 0.5 inclusive is less than or equal to 10 percent; wherein the conditional probability is 

28 substantially conditional on (1) the chromosomal distribution of the covering markers on the 

29 chromosome segments of the set, (2) the marker type of each covering marker and (3) there being N or 

30 more covering markers in each cell of 81 percent or more of the cells of the matnx 

31 1 3 A process as in claim 1 1 , wherein the chromosome or the chromosomal segment consists 

32 essentially of a set of nonoverlapping chromosome segments of substantially equal length, and wherein 

33 each chromosome segment of the set has two or less covering markers located thereon, wherein one 

34 and only one covering marker is located on each of 80 percent or more of the chromosome segments of 

35 the set, and wherein zero or two and only two covering markers are located on each of 20 percent or 
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1 less of the chromosome segments of the set, and wherein each chromosome segment that borders a 

2 chromosome segment with zero covering markers located thereon has only one or two covering 

3 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

4 with two covenng markers located thereon has only one or zero covering markers located thereon; and 

5 wherein the conditional probability the covering markers were chosen essentially randomly from 

6 substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

7 inclusive and 0.5 inclusive is less than or equal to 10 percent, wherein the conditional probability is 

8 substantially conditional on (1 ) the chromosomal distnbution of the covering markers on the 

9 chromosome segments of the set. (2) the marker type of each covering marker and (3) there being N or 

10 more covering markers in each cell of 81 percent or more of the cells of the matrix 

11 1 4 A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

12 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

13 the covering markers is less than or equal to 2 cM or the equivalent thereof; and wherein collection C is 

14 essentially the collection of known groups of bi-allelic markers with least common allele frequencies 

15 between 0.2 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the 

16 covering markers as a group; wherein a group of bi-allelic markers is a member of collection C if and 

17 only if the group substantially meets criteria (1 ), (2), (3) and (4); wherein criterion (1) is, each marker in 

18 the group is chosen from substantially the known set of bi-allelic markers with least common allele 

19 frequencies between 0.2 inclusive and 0.5 inclusive, wherein criterion (2) is, the number of markers in 

20 the group is the same as the number of covering markers, wherein criterion (3) is, the chromosomal 

21 distribution of the group of markers and the covering markers is substantially similar, and wherein 

22 criterion (4) is, the marker type of each group marker and the covenng marker with substantially the 

23 same chromosomal location is the same; wherein a group that is a member of collection C meets 

24 criterion (5) if and only if (5) there are N or more of the group markers in each cell of 81 percent or more 

25 of the cells of a CL-F matrix with cells of length Lmc and width Wmc in R rows and C columns; wherein P 

26 is essentially the proportion of groups in collection C that meet criterion (5); wherein P is less than 90 

27 percent. 

28 1 5. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

29 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

30 the covenng markers is greater than 2 cM or the equivalent thereof, and wherein collection C is 

31 essentially the collection of known groups of bi-allelic markers with least common allele frequencies 

32 between 0 3 inclusive and 0 5 inclusive, each group in the collection being substantially similar to the 

33 covering markers as a group, wherein a group of bi-allelic markers is a member of collection C if and 

34 only if the group substantially meets criteria (1 ), (2), (3) and (4); wherein criterion (1) is, each marker in 

35 the group is chosen from substantially the known set of bi-allelic markers with least common allele 
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1 frequencies between 0.3 inclusive and 0.5 inclusive, wherein criterion (2) is. the number of markers in 

2 the group is the same as the number of covering markers, wherein criterion (3) is, the chromosomal 

3 distribution of the group of markers and the covenng markers is substantially similar, and wherein 

4 critenon (4) is, the marker type of each group marker and the covering marker with substantially the 

5 same chromosomal location is the same, wherein a group that is a member of collection C meets 

6 criterion (5) if and only if (5) there are N or more of the group markers in each cell of 81 percent or more 

7 of the cells of a CL-F matrix with cells of length L^c and width Wmc in R rows and C columns, wherein P 

8 is essentially the proportion of groups in collection C that meet criterion (5), wherein P is less than 90 

9 percent 

10 16. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

11 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

12 the covering markers is less than or equal to 2 cM or the equivalent thereof; wherein the chromosome or 

13 the chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

14 substantially equal length, and wherein each chromosome segment of the set has two or less covering 

15 markers located thereon, wherein one and only one covering marker is located on each of 80 percent or 

16 more of the chromosome segments of the set, and wherein zero or two and only two covenng markers 

17 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

18 chromosome segment that borders a chromosome segment with zero covering markers located thereon 

19 has only one or two covering markers located thereon, and wherein each chromosome segment that 

20 borders a chromosome segment with two covering markers located thereon has only one or zero 

21 covering markers located thereon; wherein collection D is essentially the collection of known groups of 

22 bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive, each 

23 group in the collection being substantially similar to the covering markers as a group; wherein a group of 

24 bi-allelic markers is a member of collection D if and only if the group sutDstantially meets catena (1 ), (2), 

25 (3) and (4); wherein criterion (1 ) is. each marker in the group is chosen from substantially the known set 

26 of bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive, 

27 wherein criterion (2) is, the number of markers in the group is the same as the number of covering 

28 markers, wherein cntenon (3) is. the chromosomal distribution of the group of markers and the covering 

29 markers is substantially similar, and wherein criterion (4) is, the marker type of each group marker and 

30 the covenng marker with substantially the same chromosomal location is the same, wherein a group 

3 1 that IS a member of collection D meets criterion (5) if and only if (5) there are N or more of the group 

32 markers in each cell of 81 percent or more of the cells of a CL-F matrix with cells of length Lmc and 

33 width Wmc m R rows and C columns; wherein P is essentially the proportion of groups in collection D 

34 that meet criterion (5), wherein P is less than 90 percent. 
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1 17. A process as in claim 5, wherein the covering markers are substantially evenly distributed across a 

2 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

3 the covering markers is greater than 2 cM or the equivalent thereof; wherein the chromosome or the 

4 chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

5 substantially equal length, and wherein each chromosome segment of the set has two or less covering 

6 markers located thereon, wherein one and only one covenng marker is located on each of 80 percent or 

7 more of the chromosome segments of the set, and wherein zero or two and only two covenng markers 

8 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

9 chromosome segment that borders a chromosome segment with zero covering markers located thereon 

10 has only one or two covering markers located thereon, and wherein each chromosome segment that 

1 1 borders a chromosome segment with two covering markers located thereon has only one or zero 

12 covering markers located thereon; wherein collection D is essentially the collection of known groups of 

13 bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive, each 

14 group in the collection being substantially similar to the covering markers as a group; wherein a group of 

15 bi-allelic markers is a member of collection D if and only if the group substantially meets criteria (1), (2), 

16 (3) and (4); wherein criterion (1 ) is, each marker in the group is chosen from substantially the known set 

17 of bi-alielic markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive, 

18 wherein criterion (2) is, the number of markers in the group is the same as the number of covering 

19 markers, wherein criterion (3) is, the chromosomal distribution of the group of markers and the covering 

20 markers is substantially similar, and wherein criterion (4) is, the marker type of each group marker and 

21 the covering marker with substantially the same chromosomal location is the same; wherein a group 

22 that IS a member of collection D meets criterion (5) if and only if (5) there are N or more of the group 

23 markers in each cell of 81 percent or more of the cells of a CL-F matrix with cells having length Lmc and 

24 width Wmc in R rows and C columns, wherein P is essentially the proportion of groups in collection D 

25 that meet critenon (5); wherein P is less than 90 percent. 

26 18. A process as in claim 4, wherein the covering markers meet criterion a) and one or more of the 

27 critena b), c), d), e), f), g), h) or i), wherein 

28 

29 critenon a) is the covering markers are distributed over a chromosomal region of interest, the region of 

30 interest being approximately the smallest chromosome interval that contains all of the covering markers, 

31 and the covenng markers comprise essentially less than all of the polymorphisms in the region of 

32 interest, 
33 

34 criterion b) is the covering markers are substantially nonevenly distributed across a chromosome or a 

35 chromosomal segment. 
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2 criterion c) is the covering markers are substantially evenly distributed across a chromosome or a 

3 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

4 the markers in the subgroup is chosen without substantial preference for the least common allele 

5 frequency of each of the markers in the subgroup being close to 0 5, and the number of covering 

6 markers in the subgroup is about 5 percent or more of the total number of covering markers, 



8 criterion d) is the covering markers are substantially evenly distributed across a chromosome or a 

9 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

10 the markers in the subgroup is chosen without substantial preference for the least common allele 

1 1 frequency of each of the markers in the subgroup being close to 0.5; 
12 

13 criterion e) is the covenng markers are substantially evenly distributed across a chromosome or a 

14 chromosomal segment, wherein (1 ) the average chromosomal intermarker distance of the covering 

15 markers is greater than 2 cM or the equivalent thereof and the least common allele frequency of one or 

16 more of the covering markers is less than 0.3, or wherein (2) the least common allele frequency of one 

17 or more of the covering markers is less than 0.2, or wherein (3) the average chromosomal intermarker 

18 distance of the covering markers is greater than 10 ciVI or the equivalent thereof; 
19 

20 critenon f) is the covering markers are substantially evenly distributed across a chromosome or a 

21 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

22 is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional probability the 

23 covering markers were chosen essentially randomly from substantially the known set of bi-alleiic 

24 markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive is less than or 

25 equal to 10 percent; wherein the conditional probability is substantially conditional on (1) the 

26 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 

27 marker and (3) the CL-F region being N covered to within the CL-F distance 5 by the two or more bi- 

28 allelic covenng markers; 
29 

30 critenon g) is the covering markers are substantially evenly distributed across a chromosome or a 

31 chromosomal segment, wherein the average chromosomal intermarker distance of the covenng markers 

32 is greater than 2 cM or the equivalent thereof, and wherein the conditional probability the covenng 

33 markers were chosen essentially randomly from substantially the known set of bi-allelic markers with 

34 least common allele frequencies between 0.3 inclusive and 0.5 inclusive is less than or equal to 10 

35 percent, wherein the conditional probability is substantially conditional on (1 ) the approximate 
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1 chromosomal distribution of the covering markers, (2) the marker type of each covering marker and (3) 

2 the CL-F region being N covered to within the CL-F distance 5 by the two or more bi-allelic covering 

3 markers; 
4 

5 criterion h) is the covering markers are substantially evenly distributed across a chromosome or a 

6 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

7 is less than or equal to 2 cM or the equivalent thereof, and wherein collection C is essentially the 

8 collection of known groups of bi-allelic markers with least common allele frequencies between 0.2 

9 inclusive and 0 5 inclusive, each group in the collection being substantially similar to the covering 

10 markers as a group, wherein a group of bi-allelic markers is a member of collection C if and only if the 

1 1 group substantially meets cnteria (1 ), (2), (3) and (4); wherein criterion (1 ) is, each marker in the group 

12 is chosen from substantially the known set of bi-allelic markers with least common allele frequencies 

13 between 0.2 inclusive and 0 5 inclusive, wherein critenon (2) is, the number of markers in the group is 

14 the same as the number of covering markers, wherein criterion (3) is, the chromosomal distribution of 

15 the group of markers and the covering markers is sutsstantially similar, and wherein criterion (4) is, the 

16 marker type of each group marker and the covering marker with substantially the same chromosomal 

17 location IS the same, wherein a group that is a member of collection C meets criterion (5) if and only if 

18 (5) the CL-F region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering 

19 markers, wherein P is essentially the proportion of groups in collection C that meet cnterion (5); wherein 

20 P is less than 90 percent; 
21 

22 and critenon i) is the covering markers are substantially evenly distributed across a chromosome or a 

23 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

24 is greater than 2 cM or the equivalent thereof, and wherein collection C is essentially the collection of 

25 known groups of bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0 5 

26 inclusive, each group in the collection being substantially similar to the covenng markers as a group; 

27 wherein a group of bi-allelic markers is a member of collection C if and only if the group substantially 

28 meets criteria (1), (2), (3) and (4); wherein criterion (1) is, each marker in the group is chosen from 

29 substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

30 inclusive and 0 5 inclusive, wherein critenon (2) is, the number of markers in the group is the same as 

31 the number of covenng markers, wherein criterion (3) is, the chromosomal distribution of the group of 

32 markers and the covenng markers is substantially similar, and wherein criterion (4) is, the marker type of 

33 each group marker and the covenng marker with substantially the same chromosomal location is the 

34 same; wherein a group that is a member of collection C meets cnterion (5) if and only if (5) the CL-F 

35 region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering markers; wherein P 
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IS essentially the proportion of groups in collection C ttiat meet criterion (5); wherein P is less than 90 
percent. 

19 A process as in claim 18, wherein 5 is less than or equal to about [1 cM, 0.15] or the equivalent 
thereof 

28sA process as in any one of claims 3-19, wherein there is a subgroup of the covering markers, and 
the rriaflfeis in the subgroup are a majority of the covering markers, and each marker in the subgroup is 
an SNP, or abh^a+lglic marker equivalent formed only from one or more SNPs. 

21 . A process as in claim 20, wherein the process comprises a computer program 

22. A process as in claim 20 wherein Lmc is less than or equal to about 250,000 bp or the equivalent 
thereof. Wmc is less than or equal to about 0 15, wherein the species is human being, wherein the same 
statistical linkage test based on allelic association is chosen for each covering marker in step b). 

23. A process"a&NijT^any one of claim 22, wherein the process comprises a computer program 

24. An apparatus for identifying bi-aileiic markers linked to a bi-allelic genetic characteristic gene in a 
species of creatures, comprising, means to practice each of the steps of a process as in claim 20. 

25. An apparatus as in claim 24, wherein the apparatus comprises oligonucleotide technology or mass 
spectrometry. 

26. An apparatus for identifying bi-allelic markers linked to a bi-allelic genetic characteristic gene in a 
species of creatures, comprising, means to practice each of the steps of a process as in any one of the 
claims 3-19. 

27 An apparatus as in claim 26, wherein the apparatus comprises a computer, the computer being 
supplied with proper data and instructions 

28 A process for localizing a bi-allelic genetic characteristic gene in a species of creatures to a CL-F 
location, comprising the steps of. a process as in claim 20, further comprising the step f) of. localizing 
the gene to the CL-F location of one or more markers that show evidence for linkage based on the 
calculations of step e). 

29 A process for localizing a bi-alleic genetic characteristic gene in a species of creatures to a CL-F 
location, comprising the steps of. a process in any one of the claims 3-19; further comprising, the step f) 
of: localizing the gene to the CL-F location of one or more markers that show evidence for linkage 
based on the calculations of step e) 

30 A process as in claim 29, wherein the process comprises a computer program 

31 An apparatus for localizing a bi-alleic genetic characteristic gene in a species of creatures to a CL-F 
location, comprising means to practice each of the steps of a process as in claim 29 
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1 32. An apparatus for localizing a bi-alleic genetic characteristic gene in a species of creatures to a CL-F 

2 location as in claim 31 , wherein the apparatus comprises a computer, the computer being supplied with 

3 proper data and instructions. 



5 33. A process for obtaining genotype data/sample allele frequency data for each bi-allelic marker of a 

6 group of two or more bi-alielic covering markers in the chromosomal DNA of one or more individuals of 

7 a sample, each individual in the sample being a member of the same species, comprising 
8 

9 a)determining information on the presence or absence of each allele of each bi-allelic marker of a group 

10 of two or more bi-allelic covering markers in the chromosomal DNA of one or more individuals of a 

1 1 sample, a CL-F region being systematically covered by the two or more bi-allelic covering markers, the 

12 CL-F region being a collection of points on a two-dimensional plane, the two-dimensional plane having 

13 the two orthogonal dimensions of chromosomal location and least common allele frequency, and 
14 

15 b) transforming the information of step a) into genotype data/sample allele frequency data for each 

16 marker of the group 
17 

18 34. A process for obtaining genotype data/sample allele frequency data as in claim 33, wherein the CL- 

19 F region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering markers, so that 

20 each point in the region is within the CL-F distance 5 of N or more of the covering markers, wherein 5 is 

21 equal to about [ 5cl, 0.25] or the equivalent thereof, 6cl is equal to the largest chromosomal length, 

22 computed by any method, for which linkage disequilibrium has been observed between any 

23 polymorphisms in any population of the species. N is an integer greater than or equal to 1 . 

24 35. A process for obtaining genotype data/sample allele frequency data as in claim 34, wherein the CL- 

25 F region includes 81 percent or more of the centerpoints of the matrix centerpoint lattice of a CL-F 

26 matrix, the number of cells in the matrix being greater than or equal to three, wherein the matrix has R 

27 rows and C columns, each cell of the matrix being of length Lmc and width Wmc, and Lmc being less than 

28 or equal to about 25cl. and Wmc being less than or equal to 0.5, 5cl is equal to the largest chromosomal 

29 length, computed by any method, for which linkage disequilibrium has been observed between any 

30 polymorphisms in any population of the species, there being N or more covering markers in each cell of 

3 1 81 percent or more of the cells of the matrix, N is an integer greater than or equal to 1 ; the covering 

32 markers being distributed over a chromosomal region of interest, the region of interest being 

33 approximately the smallest chromosome interval that contains all of the covering markers, and the 

34 covering markers comprising essentially less than ail of the polymorphisms in the region of interest 
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1 36. A process for obtaining genotype data/sample allele frequency data as in claim 35, wlierein the 

2 covering markers are substantially nonevenly distributed across a chromosome or a chromosomal 

3 segment 

4 37. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

5 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

6 and wherein there is a subgroup of one or more of the covering markers, and each of the markers in the 

7 subgroup is chosen without substantia! preference for the least common allele frequency of each of the 

8 markers in the subgroup being close to 0 5, and the number of covering markers in the subgroup is 

9 about 5 percent or more of the total number of covering markers. 

10 38. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

1 1 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

12 and wherein there is a sutagroup of one or more of the covering markers, and each of the markers in the 

13 subgroup is chosen without substantial preference for the least common allele frequency of each of the 

14 markers in the subgroup being close to 0.5. 

15 39. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

16 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

17 wherein (1 ) the average chromosomal intermarker distance of the covering markers is greater than 2 cM 

1 8 or the equivalent thereof and the least common allele frequency of one or more of the covering markers 

19 is less than 0.3, or wherein (2) the least common allele frequency of one or more of the covering 

20 markers is less than 0.2, or wherein (3) the average chromosomal intermarker distance of the covering 

21 markers is greater than 1 0 cM or the equivalent thereof. 

22 40. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

23 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

24 wherein the average chromosomal intermarker distance of the covenng markers is less than or equal to 

25 2 cM or the equivalent thereof, and wherein the conditional probability the covering markers were 

26 chosen essentially randomly from substantially the known set of bi-allelic markers with least common 

27 allele frequencies between 0.2 inclusive and 0 5 inclusive is less than or equal to 10 percent, wherein 

28 the conditional probability is substantially conditional on (1) the approximate chromosomal distribution of 

29 the covering markers, (2) the marker type of each covering marker and (3) there being N or more 

30 covering markers in each cell of 81 percent or more of the cells of the matrix 

31 41 . A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

32 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

33 wherein the average chromosomal intermarker distance of the covering markers is greater than 2 cM or 

34 the equivalent thereof, and wherein the conditional probability the covering markers were chosen 

35 essentially randomly from substantially the known set of bi-allelic markers with least common allele 
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1 frequencies between 0.3 inclusive and 0.5 inclusive is less than or equal to 10 percent; wherein the 

2 conditional probability is substantially conditional on (1) the approximate chromosomal distribution of the 

3 covering markers, (2) the marker type of each covering marker and (3) there being N or more covering 

4 markers in each cell of 81 percent or more of the cells of the matnx 

5 42 A process for obtaining genotype data/sample allele frequency data as in claim 40, wherein the 

6 chromosome or the chromosomal segment consists essentially of a set of nonoverlapping chromosome 

7 segments of substantially equal length, and wherein each chromosome segment of the set has two or 

8 less covering markers located thereon, wherein one and only one covering marker is located on each of 

9 80 percent or more of the chromosome segments of the set, and wherein zero or two and only two 

10 covering markers are located on each of 20 percent or less of the chromosome segments of the set, 

1 1 and wherein each chromosome segment that borders a chromosome segment with zero covering 

12 markers located thereon has only one or two covering markers located thereon, and wherein each 

13 chromosome segment that borders a chromosome segment with two covering markers located thereon 

14 has only one or zero covering markers located thereon; and wherein the conditional probability the 

15 covering markers were chosen essentially randomly from substantially the known set of bi-allelic 

16 markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive is less than or 

17 equal to 10 percent; wherein the conditional probability is substantially conditional on (1) the 

1 8 chromosomal distribution of the covering markers on the chromosome segments of the set, (2) the 

19 marker type of each covering marker and (3) there being N or more covering markers in each cell of 81 

20 percent or more of the cells of the matrix. 

21 43. A process for obtaining genotype data/sample allele frequency data as in claim 41 , wherein the 

22 chromosome or the chromosomal segment consists essentially of a set of nonoverlapping chromosome 

23 segments of substantially equal length, and wherein each chromosome segment of the set has two or 

24 less covering markers located thereon, wherein one and only one covering marker is located on each of 

25 80 percent or more of the chromosome segments of the set, and wherein zero or two and only two 

26 covering markers are located on each of 20 percent or less of the chromosome segments of the set, 

27 and wherein each chromosome segment that borders a chromosome segment with zero covering 

28 markers located thereon has only one or two covenng markers located thereon, and wherein each 

29 chromosome segment that borders a chromosome segment with two covering markers located thereon 

30 has only one or zero covenng markers located thereon; and wherein the conditional probability the 

31 covering markers were chosen essentially randomly from substantially the known set of bi-allelic 

32 markers with least common allele frequencies between 0 3 inclusive and 0.5 inclusive is less than or 

33 equal to 10 percent; wherein the conditional probability is substantially conditional on (1) the 

34 chromosomal distribution of the covering markers on the chromosome segments of the set, (2) the 
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1 marker type of each covering marker and (3) there being N or more covering markers in each cell of 81 

2 percent or more of the cells of the matrix 

3 44. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

4 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

5 vi/herein the average chromosomal intermarker distance of the covering markers is less than or equal to 

6 2 cM or the equivalent thereof, and wherein collection C is essentially the collection of known groups of 

7 bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive, each 

8 group in the collection being substantially similar to the covenng markers as a group; wherein a group of 

9 bi-allelic markers is a member of collection C if and only if the group substantially meets criteria (1 ), (2), 

10 (3) and (4), wherein criterion (1) is, each marker in the group is chosen from substantially the known set 

1 1 of bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive, 

12 wherein criterion (2) is, the number of markers in the group is the same as the number of covering 

13 markers, wherein criterion (3) is, the chromosomal distnbution of the group of markers and the covering 

14 markers is substantially similar, and wherein criterion (4) is, the marker type of each group marker and 

15 the covering marker with substantially the same chromosomal location is the same; wherein a group 

16 that IS a member of collection C meets criterion (5) if and only if (5) there are N or more of the group 

17 markers in each cell of 81 percent or more of the cells of a CL-F matrix with cells having length Lmc and 

18 width Wmc in R rows and C columns; wherein P is essentially the proportion of groups in collection C 

19 that meet criterion (5); wherein P is less than 90 percent, 

20 45. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

21 covenng markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

22 wherein the average chromosomal intermarker distance of the covering markers is greater than 2 cM or 

23 the equivalent thereof, and wherein collection C is essentially the collection of known groups of bi-allelic 

24 markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive, each group in 

25 the collection being substantially similar to the covenng markers as a group, wherein a group of bi-allelic 

26 markers is a member of collection C if and only if the group substantially meets criteria (1 ), (2), (3) and 

27 (4); wherein criterion (1 ) is, each marker in the group is chosen from substantially the known set of bi- 

28 allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive, wherein 

29 criterion (2) is, the number of markers in the group is the same as the number of covering markers, 

30 wherein criterion (3) is, the chromosomal distribution of the group of markers and the covering markers 

31 is substantially similar, and wherein criterion (4) is, the marker type of each group marker and the 

32 covering marker with substantially the same chromosomal location is the same, wherein a group that is 

33 a member of collection C meets critenon (5) if and only if (5) there are N or more of the group markers in 

34 each cell of 81 percent or more of the cells of a CL-F matrix with cells having length Lmc and width Wmc 
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1 in R rows and C columns; wherein P is essentially the proportion of groups in collection C that meet 

2 criterion (5); wherein P is less than 90 percent 

3 46. A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

4 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

5 wherein the average chromosomal intermarker distance of the covering markers is less than or equal to 

6 2 cM or the equivalent thereof; wherein the chromosome or the chromosomal segment consists 

7 essentially of a set of nonoverlapping chromosome segments of substantially equal length, and wherein 

8 each chromosome segment of the set has two or less covering markers located thereon, wherein one 

9 and only one covering marker is located on each of 80 percent or more of the chromosome segments of 

10 the set, and wherein zero or two and only two covering markers are located on each of 20 percent or 

1 1 less of the chromosome segments of the set, and wherein each chromosome segment that borders a 

12 chromosome segment with zero covering markers located thereon has only one or two covering 

13 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

14 with two covering markers located thereon has only one or zero covering markers located thereon, 

15 wherein collection D is essentially the collection of known groups of bi-allelic markers with least common 

16 allele frequencies between 0.2 inclusive and 0.5 inclusive, each group in the collection being 

17 substantially similar to the covering markers as a group; wherein a group of bi-allelic markers is a 

1 8 member of collection D if and only if the group substantially meets criteria (1 ), (2), and (3)- (1 ) each 

19 marker in the group is chosen from substantially the known set of bi-allelic markers with least common 

20 allele frequencies between 0.2 inclusive and 0.5 inclusive, (2) the number of covering markers and the 

21 number of group markers located on each chromosome segment of the set is the same, and (3) there is 

22 a group marker of the same type as each covering marker located on the same chromosome segment 

23 of the set as each covering marker; wherein a group that is a member of collection D substantially 

24 meets criterion (5) if and only if (5) there are N or more of the group markers in each cell of the matrix, 

25 wherein P is essentially the proportion of groups in collection D that meet criterion (5). wherein P is less 

26 than 90 percent. 

27 47 A process for obtaining genotype data/sample allele frequency data as in claim 35, wherein the 

28 covering markers are substantially evenly distributed across a chromosome or a chromosomal segment, 

29 wherein the average chromosomal intermarker distance of the covering markers is greater than 2 cM or 

30 the equivalent thereof, wherein the chromosome or the chromosomal segment consists essentially of a 

31 set of nonoveriapping chromosome segments of substantially equal length, and wherein each 

32 chromosome segment of the set has two or less covenng markers located thereon, wherein one and 

33 only one covering marker is located on each of 80 percent or more of the chromosome segments of the 

34 set, and wherein zero or two and only two covenng markers are located on each of 20 percent or less of 

35 the chromosome segments of the set, and wherein each chromosome segment that borders a 



AA^nENDEDSHEET 



PCT/US99/04376 




^fr/US99/04376 

IPEA/US 2 2 MAY 2000 



1 chromosome segment with zero covering markers located thereon has only one or two covering 

2 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

3 with two covering markers located thereon has only one or zero covering markers located thereon, 

4 wherein collection D is essentially the collection of known groups of bi-allelic markers with least common 

5 allele frequencies between 0.3 inclusive and 0 5 inclusive, each group in the collection being 

6 substantially similar to the covering markers as a group, wherein a group of bi-allelic markers is a 

7 member of collection D if and only if the group substantially meets criteria (1 ), (2), and (3). (1 ) each 

8 marker in the group is chosen from substantially the known set of bi-alleiic markers with least common 

9 allele frequencies between 0.3 inclusive and 0.5 inclusive, (2) the number of covering markers and the 

10 number of group markers located on each chromosome segment of the set is the same, and (3) there is 

1 1 a group marker of the same type as each covering marker located on the same chromosome segment 

12 of the set as each covering marker; wherein a group that is a member of collection D substantially 

13 meets criterion (5) if and only if (5) there are N or more of the group markers in each cell of the matrix; 

14 wherein P is essentially the proportion of groups in collection D that meet criterion (5), wherein P is less 

15 than 90 percent 

16 48. A process for obtaining genotype data/sample allele frequency data as in claim 34, wherein the 

17 covering markers meet criterion a) and one or more of the criteria b), c), d), e), f), g), h) or i), wherein 
18 

19 criterion a) is the covering markers are distributed over a chromosomal region of interest, the region of 

20 interest being approximately the smallest chromosome interval that contains all of the covering markers, 

21 and the covering markers comprise essentially less than all of the polymorphisms in the region of 
22 : interest; 

23 

24 criterion b) is the covering markers are substantially nonevenly distributed across a chromosome or a 

25 chromosomal segment; 

26 

27 criterion c) is the covering markers are substantially evenly distributed across a chromosome or a 

28 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

29 the markers in the subgroup is chosen without substantial preference for the least common allele 

30 frequency of each of the markers in the subgroup being close to 0.5, and the number of covenng 

31 markers in the subgroup is about 5 percent or more of the total number of covenng markers; 
32 

33 criterion d) is the covering markers are substantially evenly distributed across a chromosome or a 

34 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

35 the markers in the subgroup is chosen without substantial preference for the least common allele 
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1 frequency of each of the markers in the subgroup being close to 0.5, 

2 

3 criterion e) is the covering markers are substantially evenly distributed across a chromosome or a 

4 chromosomal segment, wherein (1) the average chromosomal intermarker distance of the covering 

5 markers is greater than 2 cM or the equivalent thereof and the least common allele frequency of one or 

6 more of the covering markers is less than 0.3, or wherein (2) the least common allele frequency of one 

7 or more of the covering markers is less than 0 2, or wherein (3) the average chromosomal intermarker 

8 distance of the covering markers is greater than 10 cM or the equivalent thereof, 
9 

10 criterion f) is the covenng markers are substantially evenly distributed across a chromosome or a 

1 1 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

12 is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional probability the 

13 covering markers were chosen essentially randomly from substantially the known set of bi-atlelic 

14 markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive is less than or 

15 equal to 10 percent; wherein the conditional probability is substantially conditional on (1 ) the 

16 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 

17 marker and (3) the CL-F region being N covered to within the CL-F distance 6 by the two or more bi- 

18 allelic covering markers, 
19 

20 criterion g) is the covering markers are substantially evenly distributed across a chromosome or a 

21 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

22 is greater than 2 cM or the equivalent thereof; and wherein the conditional probability the covering 

23 markers were chosen essentially randomly from substantially the known set of bi-allelic markers with 

24 least common allele frequencies between 0.3 inclusive and 0.5 inclusive is less than or equal to 10 

25 percent; wherein the conditional probability is substantially conditional on (1) the approximate 

26 chromosomal distribution of the covenng markers, (2) the marker type of each covering marker and (3) 

27 the CL-F region being N covered to within the CL-F distance 5 by the two or more bi-ailelic covenng 

28 markers, 
29 

30 criterion h) is the covering markers are substantially evenly distributed across a chromosome or a 

31 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

32 is less than or equal to 2 cM or the equivalent thereof; and wherein collection C is essentially the 

33 collection of known groups of bi-allelic markers with least common allele frequencies between 0.2 

34 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the covering 

35 markers as a group, wherein a group of bi-allelic markers is a member of collection C if and only if the 
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1 group substantially meets criteria (1), (2), (3) and (4), wherein criterion (1) is, each marker in the group 

2 is chosen from substantially the known set of bi-ailelic markers with least common allele frequencies 

3 between 0 2 inclusive and 0.5 inclusive, wherein criterion (2) is, the number of markers in the group is 

4 the same as the number of covering markers, wherein criterion (3) is, the chromosomal distribution of 

5 the group of markers and the covering markers is substantially similar, and wherein criterion (4) is, the 

6 marker type of each group marker and the covering marker with substantially the same chromosomal 

7 location IS the same; wherein a group that is a member of collection C meets critenon (5) if and only if 

8 (5) the CL-F region is N covered to within a CL-F distance 6 by the two or more bi-allelic covering 

9 markers; wherein P is essentially the proportion of groups in collection C that meet criterion (5); wherein 
10 P is less than 90 percent; 

11 

12 and critenon i) is the covering markers are substantially evenly distributed across a chromosome or a 

13 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

14 is greater than 2 cM or the equivalent thereof, and wherein collection C is essentially the collection of 

15 known groups of bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 

16 inclusive, each group in the collection being substantially similar to the covering markers as a group; 

17 wherein a group of bi-allelic markers is a member of collection C if and only if the group substantially 

18 meets criteria (1 ), (2), (3) and (4); wherein criterion (1 ) is, each marker in the group is chosen from 

19 substantially the known set of bi-alielic markers with least common allele frequencies between 0.3 

20 inclusive and 0.5 inclusive, wherein criterion (2) is, the number of markers in the group is the same as 

21 the number of covering markers, wherein criterion (3) is, the chromosomal distribution of the group of 

22 markers and the covering markers is substantially similar, and wherein criterion (4) is, the marker type of 

23 each group marker and the covering marker with substantially the same chromosomal location is the 

24 same, wherein a group that is a member of collection C meets criterion (5) if and only if (5) the CL-F 

25 region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering markers; wherein P 

26 is essentially the proportion of groups in collection C that meet criterion (5), wherein P is less than 90 

27 percent 

28 49. A process for obtaining genotype data/sample allele frequency data as in claim 48, wherein 6 is less 

29 than or equal to about [1 cM, 0.15] or the equivalent thereof. 

|30 ^&r..^process for obtaining genotype data/sample allele frequency data as in any one of claims 33-49, 
whereintPtefQis a subgroup of the covering markers, and the markers in the subgroup are a majority of 
the covering marRe<:^^and each marker in the subgroup is an SNP, or a bi-allelic marker equivalent 
formed only from one orrflTace^NPs. 

51. A process for obtaining genotype data/sample allele frequency data as in claim 50, wherein the 
process comprises a computer program 
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1 52. A process for obtaining genotype data/sample allele frequency data as in claim 50 wherein Lmc is 

2 less than or equal to about 250,000 bp or the equivalent thereof, Wmc is less than or equal to about 

3 0.15, wherein the species is human being. 

4 53. A process for obtaining genotype data/sample allele frequency data as in claim 52, wherein the 

5 process comprises a computer program. 

6 

7 54vAn apparatus for obtaining genotype data/sample allele frequency data for each bi-ailelic marker of 

8 a group-sttwo or f-raT^ bi-allelic covering markers in the chromosomal DNA of one or more individuals 

9 of a sample, e^etHTidividual in the sample being a member of the same species, comprising- means to 

10 practice each of the st&ps.of a process as in any one of the claims 33-49. 

1 1 55. An apparatus for obtaining genotype data/sample allele frequency data for each bi-allelic marker of 

12 a group of two or more bi-allelic covering markers in the chromosomal DNA of one or more individuals 

13 of a sample, each individual in the sample being a member of the same species, comprising means to 

14 practice each of the steps of a process as in claim 50. 

15 56. An apparatus as in claim 55, wherein the apparatus comprises oligonucleotide technology or mass 

16 spectrometry 

17 B7>AQapD^j:us as in claim 54, wherein the apparatus composes oligonucleotide technology or mass 

1 8 spectrorplCTh^s^ 

19 58. An apparatus as in claim 54, wherein the apparatus comprises a computer, the computer being 

20 supplied with proper data and instructions. 

21 

22 59. A use of one or more copies of a set of oligonucleotides to determine genotype data/sample allele 

23 frequency data for each bi-allelic marker of a group of two or more bi-allelic covering markers for one or 

24 more individuals, each individual being a member of the same species, wherein the group of covenng 

25 markers systematically cover a CL-F region, the CL-F region being a collection of points on a two- 

26 dimensional plane, the two-dimensional plane having the two orthogonal dimensions of chromosomal 

27 location and least common allele frequency 

28 60. A use as in claim 59, wherein the CL-F region is N covered to within a CL-F distance 5 by the two or 

29 more bi-allelic covering markers, so that each point in the region is within the CL-F distance 5 of N or 

30 more of the covering markers, wherein 6 is equal to about [ 6cl. 0.25] or the equivalent thereof, 5cl is 

31 equal to the largest chromosomal length, computed by any method, for which linkage disequilibrium has 

32 been observed between any polymorphisms in any population of the species, N is an integer greater 

33 than or equal to 1 . 

34 61 . A use as in claim 59, wherein the CL-F region includes 81 percent or more of the centerpoints of the 

35 matrix centerpoint lattice of a CL-F matrix, the number of cells in the matrix being greater than or equal 
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1 to three, wherein the matrix has R rows and C columns, each cell of the matrix being of length Lmc and 

2 width Wmc, and Lmc being less than or equal to about 25cl, and Wmc being less than or equal to 0.5, 5cl 

3 is equal to the largest chromosomal length, computed by any method, for which linkage disequilibrium 

4 has been observed between any polymorphisms in any population of the species, there being N or more 

5 covering markers in each cell of 81 percent or more of the cells of the matrix, N is an integer greater 

6 than or equal to 1 , the covering markers being distributed over a chromosomal region of interest, the 

7 region of interest being approximately the smallest chromosome inten/al that contains all of the covering 

8 markers, and the covering markers comprising essentially less than all of the polymorphisms in the 

9 region of interest 

10 62. A use as in claim 61 , wherein the covering markers are substantially nonevenly distributed across a 

1 1 chromosome or a chromosomal segment. 

12 63. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

13 chromosome or a chromosomal segment, and wherein there is a subgroup of one or more of the 

14 covering markers, and each of the markers in the subgroup is chosen without substantial preference for 

15 the least common allele frequency of each of the markers in the subgroup being close to 0 5, and the 

16 number of covering markers in the subgroup is about 5 percent or more of the total number of covering 

17 markers. 

18 64. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

19 chromosome or a chromosomal segment, and wherein there is a subgroup of one or more of the 

20 covenng markers, and each of the markers in the subgroup is chosen without substantial preference for 

21 the least common allele frequency of each of the markers in the subgroup being close to 0.5. 

22 65. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

23 chromosome or a chromosomal segment, wherein (1 ) the average chromosomal intermarker distance of 

24 the covering markers is greater than 2 cM or the equivalent thereof and the least common allele 

25 frequency of one or more of the covering markers is less than 0.3, or wherein (2) the least common 

26 allele frequency of one or more of the covering markers is less than 0.2, or wherein (3) the average 

27 chromosomal intermarker distance of the covering markers is greater than 10 cM or the equivalent 

28 thereof 

29 66. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

30 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

31 the covering markers is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional 

32 probability the covenng markers were chosen essentially randomly from substantially the known set of 

33 bi-allelic markers with least common allele frequencies between 0 2 inclusive and 0.5 inclusive is less 

34 than or equal to 10 percent; wherein the conditional probability is substantially conditional on (1 ) the 

35 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 
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1 marker and (3) there being N or more covering markers in each cell of 81 percent or more of the cells of 

2 the matrix. 

3 67. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

4 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

5 the covering markers is greater than 2 cM or the equivalent thereof, and wherein the conditional 

6 probability the covering markers were chosen essentially randomly from substantially the known set of 

7 bi-allelic markers with least common allele frequencies between 0 3 inclusive and 0 5 inclusive is less 

8 than or equal to 10 percent; wherein the conditional probability is substantially conditional on (1) the 

9 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 

10 marker and (3) there being N or more covering markers in each cell of 81 percent or more of the cells of 

1 1 the matnx. 

12 68. A use as in claim 66, wherein the chromosome or the chromosomal segment consists essentially of 

13 a set of nonoverlapping chromosome segments of substantially equal length, and wherein each 

14 chromosome segment of the set has two or less covering markers located thereon, wherein one and 

15 only one covering marker is located on each of 80 percent or more of the chromosome segments of the 

16 set, and wherein zero or two and only two covering markers are located on each of 20 percent or less of 

17 the chromosome segments of the set, and wherein each chromosome segment that borders a 

18 chromosome segment with zero covering markers located thereon has only one or two covering 

19 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

20 with two covenng markers located thereon has only one or zero covering markers located thereon; and 

21 wherein the conditional probability the covering markers were chosen essentially randomly from 

22 substantially the known set of bi-allelic markers with least common allele frequencies between 0.2 

23 inclusive and 0.5 inclusive is less than or equal to 10 percent, wherein the conditional probability is 

24 substantially conditional on (1) the chromosomal distribution of the covenng markers on the 

25 chromosome segments of the set, (2) the marker type of each covering marker and (3) there being N or 

26 more covenng markers in each cell of 81 percent or more of the cells of the matrix. 

27 69. A use as in claim 67, wherein the chromosome or the chromosomal segment consists essentially of 

28 a set of nonoverlapping chromosome segments of substantially equal length, and wherein each 

29 chromosome segment of the set has two or less covering markers located thereon, wherein one and 

30 only one covenng marker is located on each of 80 percent or more of the chromosome segments of the 

31 set, and wherein zero or two and only two covering markers are located on each of 20 percent or less of 

32 the chromosome segments of the set, and wherein each chromosome segment that borders a 

33 chromosome segment with zero covering markers located thereon has only one or two covering 

34 markers located thereon, and wherein each chromosome segment that borders a chromosome segment 

35 with two covering markers located thereon has only one or zero covering markers located thereon; and 
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1 wherein the conditional probability the covering markers were chosen essentially randomly from 

2 substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

3 inclusive and 0 5 inclusive is less than or equal to 10 percent; wherein the conditional probability is 

4 substantially conditional on (1 ) the chromosomal distribution of the covering markers on the 

5 chromosome segments of the set, (2) the marker type of each covering marker and (3) there being N or 

6 more covering markers in each ceil of 81 percent or more of the cells of the matrix 

7 70. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

8 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

9 the covering markers is less than or equal to 2 cM or the equivalent thereof; and wherein collection C is 

10 essentially the collection of known groups of bi-allelic markers with least common allele frequencies 

1 1 between 0.2 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the 

12 covering markers as a group; wherein a group of bi-ailelic markers is a member of collection C if and 

13 only if the group substantially meets criteria (1), (2), (3) and (4); wherein criterion (1) is. each marker in 

14 the group is chosen from substantially the known set of bi-allelic markers with least common allele 

15 frequencies between 0 2 inclusive and 0.5 inclusive, wherein critenon (2) is, the number of markers in 

16 the group is the same as the number of covering markers, wherein criterion (3) is, the chromosomal 

17 distribution of the group of markers and the covering markers is substantially similar, and wherein 

18 criterion (4) is, the marker type of each group marker and the covering marker with substantially the 

19 same chromosomal location is the same; wherein a group that is a member of collection C meets 

20 criterion (5) if and only if (5) there are N or more of the group markers in each cell of 81 percent or more 

21 of the ceils of a CL-F matrix with cells having length Lmc and width Wmc m R rows and C columns, 

22 wherein P is essentially the proportion of groups in collection C that meet criterion (5); wherein P is less 

23 than 90 percent 

14 71 . A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

25 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

26 the covering markers is greater than 2 cM or the equivalent thereof, and wherein collection C is 

27 essentially the collection of known groups of bi-allelic markers with least common allele frequencies 

28 between 0 3 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the 

29 covering markers as a group, wherein a group of bi-allelic markers is a member of collection C if and 

30 only If the group substantially meets critena (1), (2), (3) and (4), wherein criterion (1) is, each marker in 

31 the group is chosen from substantially the known set of bi-allelic markers with least common allele 

32 frequencies between 0 3 inclusive and 0 5 inclusive, wherein critenon (2) is, the number of markers in 

33 the group is the same as the number of covering markers, wherein criterion (3) is, the chromosomal 

34 distribution of the group of markers and the covering markers is substantially similar, and wherein 

35 criterion (4) is, the marker type of each group marker and the covering marker with substantially the 



AMENDED SHEET 



PCT/US99/04376 




^#CT/US99/04^r^ 

IPEA/US 2 2 MAY 2000 



1 same chromosomal location is the same; wherein a group that is a member of collection C meets 

2 criterion (5) if and only if (5) there are N or more of the group markers in each cell of 81 percent or more 

3 of the cells of a CL-F matrix with cells having length Lmc and width Wmc in R rows and C columns; 

4 wherein P is essentially the proportion of groups in collection C that meet critenon (5); wherein P is less 

5 than 90 percent 

6 72. A use as in claim 61 . wherein the covering markers are substantially evenly distributed across a 

7 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

8 the covering markers is less than or equal to 2 cM or the equivalent thereof; wherein the chromosome or 

9 the chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

10 substantially equal length, and wherein each chromosome segment of the set has two or less covering 

1 1 markers located thereon, wherein one and only one covering marker Is located on each of 80 percent or 

12 more of the chromosome segments of the set, and wherein zero or two and only two covering markers 

13 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

14 chromosome segment that borders a chromosome segment with zero covering markers located thereon 

15 has only one or two covering markers located thereon, and wherein each chromosome segment that 

16 borders a chromosome segment with two covering markers located thereon has only one or zero 

17 covering markers located thereon; wherein collection D is essentially the collection of known groups of 

18 bi-allelic markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive, each 

19 group in the collection being substantially similar to the covering markers as a group; wherein a group of 

20 bi-allelic markers is a member of collection D if and only if the group substantially meets criteria (1 ), (2), 

21 and (3): (1) each marker in the group is chosen from substantially the known set of bi-allelic markers 

22 with least common allele frequencies between 0.2 inclusive and 0.5 inclusive, (2) the number of 

23 covering markers and the number of group markers located on each chromosome segment of the set is 

24 the same, and (3) there is a group marker of the same type as each covering marker located on the 

25 same chromosome segment of the set as each covering marker; wherein a group that is a member of 

26 collection D substantially meets criterion (5) if and only if (5) there are N or more of the group markers in 

27 each cell of the matrix, wherein P is essentially the proportion of groups in collection D that meet 

28 criterion (5); wherein P is less than 90 percent 

29 73. A use as in claim 61 , wherein the covering markers are substantially evenly distributed across a 

30 chromosome or a chromosomal segment, wherein the average chromosomal intermarker distance of 

31 the covering markers is greater than 2 cM or the equivalent thereof; wherein the chromosome or the 

32 chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

33 substantially equal length, and wherein each chromosome segment of the set has two or less covering 

34 markers located thereon, wherein one and only one covering marker is located on each of 80 percent or 

35 more of the chromosome segments of the set, and wherein zero or two and only two covering markers 
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1 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

2 chromosome segment that borders a chromosome segment with zero covering markers located thereon 

3 has only one or two covering markers located thereon, and wherein each chromosome segment that 

4 borders a chromosome segment with two covering markers located thereon has only one or zero 

5 covering markers located thereon; wherein collection D is essentially the collection of known groups of 

6 bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 inclusive, each 

7 group in the collection being substantially similar to the covering markers as a group, wherein a group of 

8 bi-allelic markers is a member of collection D if and only if the group substantially meets criteria (1 ), (2), 

9 and (3)' (1 ) each marker in the group is chosen from substantially the known set of bi-allelic markers 

10 with least common allele frequencies between 0 3 inclusive and 0.5 inclusive, (2) the number of 

11 covering markers and the number of group markers located on each chromosome segment of the set is 

12 the same, and (3) there is a group marker of the same type as each covering marker located on the 

13 same chromosome segment of the set as each covering marker; wherein a group that is a member of 

14 collection D substantially meets critenon (5) if and only if (5) there are N or more of the group markers in 

15 each cell of the matrix; wherein P is essentially the proportion of groups in collection D that meet 

16 criterion (5); wherein P is less than 90 percent 

17 74. A use as in claim 60, wherein the covering markers meet criterion a) and one or more of the critena 

18 b), c), d). e), f), g), h) or i), wherein 
19 

20 criterion a) is the covering markers are distributed over a chromosomal region of interest, the region of 

21 interest being approximately the smallest chromosome interval that contains all of the covering markers, 

22 and the covering markers comprise essentially less than all of the polymorphisms in the region of 

23 interest, 



25 criterion b) is the covering markers are substantially nonevenly distributed across a chromosome or a 

26 chromosomal segment, 
27 

28 criterion c) is the covering markers are substantially evenly distnbuted across a chromosome or a 

29 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

30 the markers in the subgroup is chosen without substantial preference for the least common allele 

31 frequency of each of the markers in the subgroup being close to 0 5, and the number of covering 

32 markers in the subgroup is about 5 percent or more of the total number of covenng markers, 
33 

34 criterion d) is the covenng markers are substantially evenly distributed across a chromosome or a 

35 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 
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1 the markers in the sutjgroup is chosen without substantial preference for the least common allele 

2 frequency of each of the markers in the subgroup being close to 0.5, 
3 

4 criterion e) is the covering markers are substantially evenly distributed across a chromosome or a 

5 chromosomal segment, wherein (1) the average chromosomal intermarker distance of the covering 

6 markers is greater than 2 cM or the equivalent thereof and the least common allele frequency of one or 

7 more of the covering markers is less than 0.3, or wherein (2) the least common allele frequency of one 
S or more of the covering markers is less than 0.2, or wherein (3) the average chromosomal intermarker 
9 distance of the covering markers is greater than 10 cM or the equivalent thereof, 

10 

1 1 criterion f) is the covering markers are substantially evenly distributed across a chromosome or a 

12 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

13 is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional probability the 

14 covering markers were chosen essentially randomly from substantially the known set of bi-alielic 

15 markers with least common allele frequencies between 0.2 inclusive and 0.5 inclusive is less than or 

16 equal to 10 percent, wherein the conditional probability is substantially conditional on (1) the 

17 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 
IS marker and (3) the CL-F region being N covered to within the CL-F distance 5 by the two or more bi- 
19 allelic covering markers; 

20 

21 criterion g) is the covering markers are substantially evenly distributed across a chromosome or a 

22 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

23 is greater than 2 cM or the equivalent thereof; and wherein the conditional probability the covering 

24 markers were chosen essentially randomly from substantially the known set of bi-allelic markers with 

25 least common allele frequencies between 0.3 inclusive and 0 5 inclusive is less than or equal to 10 

26 percent, wherein the conditional probability is substantially conditional on (1) the approximate 

27 chromosomal distribution of the covering markers, (2) the marker type of each covering marker and (3) 

28 the CL-F region being N covered to within the CL-F distance 5 by the two or more bi-allelic covering 

29 markers, 
30 

31 criterion h) is the covering markers are substantially evenly distributed across a chromosome or a 

32 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

33 is less than or equal to 2 cM or the equivalent thereof; and wherein collection C is essentially the 

34 collection of known groups of bi-ailelic markers with least common allele frequencies between 0.2 

35 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the covering 
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1 markers as a group, wherein a group of bi-allelic markers is a member of collection C if and only if the 

2 group substantially meets criteria (1), (2), (3) and (4); wherein criterion (1) is, each marker in the group 

3 is chosen from substantially the known set of bi-allelic markers with least common allele frequencies 

4 between 0.2 inclusive and 0.5 inclusive, wherein criterion (2) is, the number of markers in the group is 

5 the same as the number of covering markers, wherein criterion (3) is, the chromosomal distribution of 

6 the group of markers and the covenng markers is substantially similar, and wherein criterion (4) is, the 

7 marker type of each group marker and the covering marker with substantially the same chromosomal 

8 location is the same; wherein a group that is a member of collection C meets criterion (5) if and only if 

9 (5) the CL-F region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering 

10 markers; wherein P is essentially the proportion of groups in collection C that meet criterion (5); wherein 

11 P is less than 90 percent; 
12 

13 and criterion i) is the covering markers are substantially evenly distributed across a chromosome or a 

14 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

15 is greater than 2 cM or the equivalent thereof, and wherein collection C is essentially the collection of 

16 known groups of bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0.5 

17 inclusive, each group in the collection being substantially similar to the covenng markers as a group; 

18 wherein a group of bi-allelic markers is a member of collection C if and only if the group substantially 

19 meets criteria (1 ), (2), (3) and (4); wherein criterion (1) is, each marker in the group is chosen from 

20 substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

21 inclusive and 0.5 inclusive, wherein criterion (2) is, the number of markers in the group is the same as 

22 the number of covering markers, wherein criterion (3) is, the chromosomal distribution of the group of 

23 markers and the covenng markers is substantially similar, and wherein criterion (4) is, the marker t/pe of 
14 each group marker and the covering marker with substantially the same chromosomal location is the 

25 same, wherein a group that is a member of collection C meets criterion (5) if and only if (5) the CL-F 

26 region is N covered to within a CL-F distance 5 by the two or more bi-allelic covering markers, wherein P 

27 is essentially the proportion of groups in collection C that meet criterion (5), wherein P is less than 90 

28 percent. 

29 75. A use as in claim 74, wherein 5 is less than or equal to about [1 cM, 0.1 5] or the equivalent thereof 

30 76. A use as in any one of claims 59-75, wherein there is a subgroup of the covering markers, and the 

31 markers in the subgroup are a majority of the covering markers, and each marker in the subgroup is an 

32 SNP, or a bi-allelic marker equivalent formed only from one or more SNPs 

33 77. A use as in claim 76 wherein Lmc is less than or equal to about 250,000 bp or the equivalent thereof, 

34 Wmc is less than or equal to about 0 15, wherein the species is human being. 
35 
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1 78. One or more copies of a set of oligonucleotides, the set of oligonucleotides being substantially 

2 complementary to a group of two or more bi-a!leiic covering markers of the same speaes, wherein the 

3 group of covering markers systematically cover a CL-F region, the CL-F region being a collection of 

4 points on a two-dimensionat plane, the two-dimensional plane having the two orthogonal dimensions of 

5 chromosomal location and least common allele frequency. 

6 79. One or more copies of a set of oligonucleotides as in claim 78, wherein the CL-F region is N covered 

7 to within a CL-F distance 5 by the two or more bi-allelic covering markers, so that each point in the 

8 region is within the CL-F distance 6 of N or more of the covering markers, wherein 5 is equal to about [ 

9 5cL, 0.25] or the equivalent thereof, 5cl is equal to the largest chromosomal length, computed by any 

10 method, for which linkage disequilibrium has been observed between any polymorphisms in any 

1 1 population of the species, N is an integer greater than or equal to 1 . 

12 80. One or more copies of a set of oligonucleotides as in claim 78, wherein the CL-F region includes 81 

13 percent or more of the centerpoints of the matrix centerpoint lattice of a CL-F matrix, the number of cells 

14 in the matnx being greater than or equal to three, wherein the matrix has R rows and C columns, each 

15 cell of the matnx being of length Lmc and width Wmc, and Lmc being less than or equal to about 25cl, and 

16 Wmc being less than or equal to 0.5, 6cl is equal to the largest chromosomal length, computed by any 

17 method, for which linkage disequilibrium has been observed between any polymorphisms in any 

18 population of the species, there being N or more covering markers in each cell of 81 percent or more of 

19 the cells of the matrix, N is an integer greater than or equal to 1 ; the covering markers being distributed 

20 over a chromosomal region of interest, the region of interest being approximately the smallest 

21 chromosome interval that contains all of the covering markers, and the covering markers comprising 

22 essentially less than all of the polymorphisms in the region of interest. 

23 81 . One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

24 substantially nonevenly distributed across a chromosome or a chromosomal segment 

25 82. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

26 substantially evenly distributed across a chromosome or a chromosomal segment, and wherein there is 

27 a subgroup of one or more of the covenng markers, and each of the markers in the subgroup is chosen 

28 without substantial preference for the least common allele frequency of each of the markers in the 

29 subgroup being close to 0.5, and the number of covering markers in the subgroup is about 5 percent or 

30 more of the total number of covenng markers 

31 83. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

32 substantially evenly distributed across a chromosome or a chromosomal segment, and wherein there is 

33 a subgroup of one or more of the covering markers, and each of the markers in the subgroup is chosen 

34 without substantial preference for the least common allele frequency of each of the markers in the 

35 subgroup being close to 0.5 
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1 84. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

2 substantially evenly distnbuted across a chromosome or a chromosomal segment, wherein (1 ) the 

3 average chromosomal intermarker distance of the covering markers is greater than 2 cM or the 

4 equivalent thereof and the least common allele frequency of one or more of the covering markers is less 

5 than 0 3, or wherein (2) the least common allele frequency of one or more of the covenng markers is 

6 less than 0.2, or wherein (3) the average chromosomal intermarker distance of the covenng markers is 

7 greater than 10 cM or the equivalent thereof 

8 85. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

9 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

10 chromosomal intermarker distance of the covering markers is less than or equal to 2 cM or the 

1 1 equivalent thereof, and wherein the conditional probability the covering markers were chosen essentially 

12 randomly from substantially the known set of bi-allelic markers with least common allele frequencies 

13 between 0.2 inclusive and 0.5 inclusive is less than or equal to 10 percent; wherein the conditional 

14 probability is substantially conditional on (1 ) the approximate chromosomal distribution of the covering 

15 markers, (2) the marker type of each covering marker and (3) there being N or more covering markers in 

16 each cell of 81 percent or more of the cells of the matrix 

17 86. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

18 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

19 chromosomal intermarker distance of the covering markers is greater than 2 cM or the equivalent 

20 thereof; and wherein the conditional probability the covering markers were chosen essentially randomly 

21 from substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

22 inclusive and 0.5 inclusive is less than or equal to 10 percent; wherein the conditional probability is 

23 substantially conditional on (1) the approximate chromosomal distribution of the covenng markers, (2) 
:4 the marker type of each covering marker and (3) there being N or more covering markers in each cell of 

25 81 percent or more of the cells of the matrix. 

26 87. One or more copies of a set of oligonucleotides as in claim 85, wherein the chromosome or the 

27 chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

28 substantially equal length, and wherein each chromosome segment of the set has two or less covenng 

29 markers located thereon, wherein one and only one covering marker is located on each of 80 percent or 

30 more of the chromosome segments of the set, and wherein zero or two and only two covering markers 

31 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

32 chromosome segment that borders a chromosome segment with zero covenng markers located thereon 

33 has only one or two covering markers located thereon, and wherein each chromosome segment that 

34 borders a chromosome segment with two covering markers located thereon has only one or zero 

35 covering markers located thereon; and wherein the conditional probability the covering markers were 
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1 chosen essentially randomly from substantially the known set of bi-allelic markers with least common 

2 allele frequencies between 0.2 inclusive and 0.5 inclusive is less than or equal to 10 percent, wherein 

3 the conditional probability is substantially conditional on (1 ) the chromosomal distribution of the covering 

4 markers on the chromosome segments of the set, (2) the marker type of each covering marker and (3) 

5 there being N or more covering markers in each cell of 81 percent or more of the cells of the matrix 

6 88. One or more copies of a set of oligonucleotides as in claim 86, wherein the chromosome or the 

7 chromosomal segment consists essentially of a set of nonoverlapping chromosome segments of 

8 substantially equal length, and wherein each chromosome segment of the set has two or less covering 

9 markers located thereon, wherein one and only one covering marker is located on each of 80 percent or 

10 more of the chromosome segments of the set, and wherein zero or two and only two covering markers 

1 1 are located on each of 20 percent or less of the chromosome segments of the set, and wherein each 

12 chromosome segment that borders a chromosome segment with zero covering markers located thereon 

13 has only one or two covering markers located thereon, and wherein each chromosome segment that 

14 borders a chromosome segment with two covering markers located thereon has only one or zero 

15 covering markers located thereon; and wherein the conditional probability the covering markers were 

16 chosen essentially randomly from substantially the known set of bi-allelic markers with least common 

17 allele frequencies between 0.3 inclusive and 0.5 inclusive is less than or equal to 10 percent, wherein 

1 8 the conditional probability is substantially conditional on (1 ) the chromosomal distribution of the covering 

19 markers on the chromosome segments of the set, (2) the marker type of each covering marker and (3) 

20 there being N or more covering markers in each cell of 81 percent or more of the cells of the matrix. 

21 89. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

22 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

23 chromosomal intermarker distance of the covering markers is less than or equal to 2 cM or the 

24 equivalent thereof, and wherein collection C is essentially the collection of known groups of bi-allelic 

25 markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive, each group in 

26 the collection being substantially similar to the covering markers as a group; wherein a group of bi-allelic 

27 markers is a member of collection C if and only if the group substantially meets cntena (1), (2), (3) and 

28 (4); wherein criterion (1 ) is, each marker in the group is chosen from substantially the known set of bi- 

29 allelic markers with least common allele frequencies between 0.2 inclusive and 0 5 inclusive, wherein 

30 criterion (2) is, the number of markers in the group is the same as the number of covering markers, 

31 wherein criterion (3) is, the chromosomal distribution of the group of markers and the covering markers 

32 is substantially similar, and wherein criterion (4) is, the marker type of each group marker and the 

33 covering marker with substantially the same chromosomal location is the same, wherein a group that is 

34 a member of collection C meets criterion (5) if and only if (5) there are N or more of the group markers in 

35 each cell of 81 percent or more of the cells of a CL-F matrix with cells having length Lmc and width Wmc 
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1 in R rows and C columns; wherein P is essentially the proportion of groups in collection C that meet 

2 criterion (5); wherein P is less than 90 percent 

3 90 One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

4 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

5 chromosomal intermarker distance of the covering markers is greater than 2 cM or the equivalent 

6 thereof, and wherein collection C is essentially the collection of known groups of bi-allelic markers with 

7 least common allele frequencies between 0 3 inclusive and 0 5 inclusive, each group in the collection 

8 being substantially similar to the covering markers as a group; wherein a group of bi-allelic markers is a 

9 member of collection C if and only if the group substantially meets criteria (1), (2), (3) and (4); wherein 

10 criterion (1 ) is, each marker in the group is chosen from substantially the known set of bi-allelic markers 

1 1 with least common allele frequencies between 0 3 inclusive and 0.5 inclusive, wherein criterion (2) is, 

12 the number of markers in the group is the same as the number of covering markers, wherein criterion 

13 (3) is, the chromosomal distribution of the group of markers and the covering markers is substantially 

14 similar, and wherein critenon (4) is, the marker type of each group marker and the covenng marker with 

1 5 substantially the same chromosomal location is the same; wherein a group that is a member of 

16 collection C meets criterion (5) if and only if (5) there are N or more of the group markers in each cell of 

17 81 percent or more of the cells of a CL-F matrix with cells having length Lmc and width Wmc in R rows 

18 and C columns, wherein P is essentially the proportion of groups in collection C that meet criterion (5); 

19 wherein P is less than 90 percent. 

20 91 One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

21 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

22 chromosomal intermarker distance of the covering markers is less than or equal to 2 cM or the 

23 equivalent thereof, wherein the chromosome or the chromosomal segment consists essentially of a set 

24 of nonoverlapping chromosome segments of substantially equal length, and wherein each chromosome 

25 segment of the set has two or less covering markers located thereon, wherein one and only one 

26 covering marker is located on each of 80 percent or more of the chromosome segments of the set, and 

27 wherein zero or two and only two covering markers are located on each of 20 percent or less of the 

28 chromosome segments of the set, and wherein each chromosome segment that borders a chromosome 

29 segment with zero covering markers located thereon has only one or two covering markers located 

30 thereon, and wherein each chromosome segment that borders a chromosome segment with two 

31 covering markers located thereon has only one or zero covenng markers located thereon, wherein 

32 collection D is essentially the collection of known groups of bi-allelic markers with least common allele 

33 frequencies between 0.2 inclusive and 0 5 inclusive, each group in the collection being substantially 

34 similar to the covering markers as a group, wherein a group of bi-alleiic markers is a member of 

35 collection D if and only if the group substantially meets criteria (1), (2), and (3) (1 ) each marker in the 
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1 group IS chosen from substantially the known set of bi-allelic markers with least common allele 

2 frequencies between 0.2 inclusive and 0 5 inclusive, (2) the number of covering markers and the 

3 number of group markers located on each chromosome segment of the set is the same, and (3) there is 

4 a group marker of the same type as each covering marker located on the same chromosome segment 

5 of the set as each covering marker, wherein a group that is a member of collection D substantially 

6 meets criterion (5) if and only if (5) there are N or more of the group markers in each cell of the matrix, 

7 wherein P is essentially the proportion of groups in collection D that meet criterion (5), wherein P is less 

8 than 90 percent 

9 92. One or more copies of a set of oligonucleotides as in claim 80, wherein the covering markers are 

10 substantially evenly distributed across a chromosome or a chromosomal segment, wherein the average 

1 1 chromosomal intermarker distance of the covering markers is greater than 2 cM or the equivalent 

12 thereof; wherein the chromosome or the chromosomal segment consists essentially of a set of 

13 nonoverlapping chromosome segments of substantially equal length, and wherein each chromosome 

14 segment of the set has two or less covering markers located thereon, wherein one and only one 

15 covering marker is located on each of 80 percent or more of the chromosome segments of the set, and 

16 wherein zero or two and only two covering markers are located on each of 20 percent or less of the 

17 chromosome segments of the set, and wherein each chromosome segment that borders a chromosome 

1 8 segment with zero covering markers located thereon has only one or two covering markers located 

19 thereon, and wherein each chromosome segment that borders a chromosome segment with two 

20 covering markers located thereon has only one or zero covering markers located thereon; wherein 

21 collection D is essentially the collection of known groups of bi-allelic markers with least common allele 

22 frequencies between 0 3 inclusive and 0 5 inclusive, each group in the collection being substantially 

23 similar to the covering markers as a group; wherein a group of bi-ailelic markers is a member of 

24 collection D if and only if the group substantially meets criteria (1). (2), and (3). (1 ) each marker in the 

25 group IS chosen from substantially the known set of bi-allelic markers with least common allele 

26 frequencies between 0.3 inclusive and 0 5 inclusive, (2) the number of covering markers and the 

27 number of group markers located on each chromosome segment of the set is the same, and (3) there is 

28 a group marker of the same type as each covenng marker located on the same chromosome segment 

29 of the set as each covenng marker; wherein a group that is a member of collection D substantially 

30 meets critenon (5) if and only if (5) there are N or more of the group markers in each cell of the matrix, 

31 wherein P is essentially the proportion of groups in collection D that meet criterion (5), wherein P is less 

32 than 90 percent 

33 93. One or more copies of a set of oligonucleotides as in claim 79, wherein the covering markers meet 

34 criterion a) and one or more of the critena b), c), d), e), f). g), h) or i), wherein 

35 
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1 criterion a) is the covering markers are distributed over a chromosomal region of interest, the region of 

2 interest being approximately the smallest chromosome interval that contains all of the covering markers, 

3 and the covering markers comprise essentially less than all of the polymorphisms in the region of 

4 interest; 



6 criterion b) is the covering markers are substantially nonevenly distributed across a chromosome or a 

7 chromosomal segment; 
8 

9 criterion c) is the covering markers are substantially evenly distributed across a chromosome or a 

10 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

1 1 the markers in the subgroup is chosen without substantial preference for the least common allele 

12 frequency of each of the markers in the subgroup being close to 0.5, and the number of covering 

13 markers in the subgroup is about 5 percent or more of the total number of covering markers, 
14 

15 criterion d) is the covering markers are substantially evenly distributed across a chromosome or a 

16 chromosomal segment, and there is a subgroup of one or more of the covering markers, and each of 

17 the markers in the subgroup is chosen without substantial preference for the least common allele 

1 8 frequency of each of the markers in the subgroup being close to 0.5, 
19 

20 criterion e) is the covering markers are substantially evenly distributed across a chromosome or a 

21 chromosomal segment, wherein (1 ) the average chromosomal intermarker distance of the covering 

22 markers is greater than 2 cM or the equivalent thereof and the least common allele frequency of one or 

23 more of the covering markers is less than 0 3. or wherein (2) the least common allele frequency of one 
M or more of the covering markers is less than 0.2, or wherein (3) the average chromosomal intermarker 
25 distance of the covering markers is greater than 10 cM or the equivalent thereof; 

26 

27 criterion f) is the covering markers are substantially evenly distributed across a chromosome or a 

28 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

29 is less than or equal to 2 cM or the equivalent thereof, and wherein the conditional probability the 

30 covering markers were chosen essentially randomly from substantially the known set of bi-allelic 

31 markers with least common allele frequencies between 0 2 inclusive and 0.5 inclusive is less than or 

32 equal to 10 percent, wherein the conditional probability is substantially conditional on (1) the 

33 approximate chromosomal distribution of the covering markers, (2) the marker type of each covering 

34 marker and (3) the CL-F region being N covered to within the CL-F distance S by the two or more bi- 

35 allelic covering markers; 
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2 criterion g) is the covering markers are substantially evenly distributed across a chromosome or a 

3 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

4 is greater than 2 cM or the equivalent thereof, and wherein the conditional probability the covering 

5 markers were chosen essentially randomly from substantially the known set of bi-allelic markers with 

6 least common allele frequencies between 0 3 inclusive and 0 5 inclusive is less than or equal to 10 

7 percent, wherein the conditional probability is substantially conditional on {1 ) the approximate 

8 chromosomal distribution of the covering markers, (2) the marker type of each covering marker and (3) 

9 the CL-F region being N covered to within the CL-F distance 5 by the two or more bi-allelic covering 
10 markers; 

11 

12 criterion h) is the covering markers are substantially evenly distributed across a chromosome or a 

13 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

14 is less than or equal to 2 cM or the equivalent thereof; and wherein collection C is essentially the 

15 collection of known groups of bi-allelic markers with least common allele frequencies between 0.2 

16 inclusive and 0.5 inclusive, each group in the collection being substantially similar to the covering 

17 markers as a group; wherein a group of bi-allelic markers is a member of collection C if and only if the 

18 group substantially meets criteria (1), (2), (3) and (4); wherein critenon (1) is, each marker in the group 

19 is chosen from substantially the known set of bi-allelic markers with least common allele frequencies 

20 between 0.2 inclusive and 0 5 inclusive, wherein criterion (2) is, the number of markers in the group is 

21 the same as the number of covering markers, wherein criterion (3) is, the chromosomal distribution of 

22 the group of markers and the covering markers is substantially similar, and wherein critenon (4) is, the 

23 marker type of each group marker and the covering marker with substantially the same chromosomal 

24 location is the same, wherein a group that is a member of collection C meets critenon (5) if and only if 

25 (5) the CL-F region is N covered to within a CL-F distance 5 by the two or more bi-allellc covering 

26 markers, wherein P is essentially the proportion of groups in collection C that meet critenon (5); wherein 

27 P is less than 90 percent, 
28 

29 and criterion i) is the covering markers are substantially evenly distnbuted across a chromosome or a 

30 chromosomal segment, wherein the average chromosomal intermarker distance of the covering markers 

31 is greater than 2 cM or the equivalent thereof, and wherein collection C is essentially the collection of 

32 known groups of bi-allelic markers with least common allele frequencies between 0.3 inclusive and 0 5 

33 inclusive, each group in the collection being substantially similar to the covering markers as a group, 

34 wherein a group of bi-allelic markers is a member of collection C if and only if the group substantially 

35 meets criteria (1 ), (2), (3) and (4); wherein criterion (1 ) is, each marker in the group is chosen from 
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1 substantially the known set of bi-allelic markers with least common allele frequencies between 0.3 

2 inclusive and 0 5 inclusive, wherein criterion (2) is, the number of markers in the group is the same as 

3 the number of covering markers, wherein criterion (3) is, the chromosomal distribution of the group of 

4 markers and the covering markers is substantially similar, and wherein criterion (4) is, the marker type of 

5 each group marker and the covering marker with substantially the same chromosomal location is the 

6 same, wherein a group that is a member of collection C meets criterion (5) if and only if (5) the CL-F 

7 region is N covered to within a CL-F distance 6 by the two or more bi-allelic covering markers, wherein P 

8 is essentially the proportion of groups in collection C that meet criterion (5), wherein P is less than 90 

9 percent 

10 94 One or more copies of a set of oligonucleotides as in claim 93, wherein 5 is less than or equal to 

1 1 about [1 cM, 0.1 5] or the equivalent thereof. 

12 95. One or more copies of a set of oligonucleotides as in any one of claims 78-94, wherein there is a 

13 subgroup of the covering markers, and the markers in the subgroup are a majority of the covering 

14 markers, and each marker in the subgroup is an SNP, or a bi-allelic marker equivalent formed only from 

15 one or more SNPs 

16 96 One or more copies of a set of oligonucleotides as in claim 95 wherein Lmc is less than or equal to 

17 about 250,000 bp or the equivalent thereof, Wmc is less than or equal to about 0.15, wherein the species 

18 is human being 
19 

20 
21 
22 
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/Abstract 

I Versions of the invention are directed to methods (including software), apparatus, compositions of matter, and new uses of compositions 
, of matter for a new type of association based linkage study technique using bi-allelic markers. In this new type of association based linkage 
study technique, the bi-allelic markers used in the new linkage studies are chosen so that the least common allele frequencies of the 
markers vary systematically over a range or subrange of least common allele frequency and the chromosomal location of the markers 
vary systematically over one or more chromosomes or chromosomal regions. And the bi-allelic markers are chosen so that the markers' 
chromosomal locations and least common allele frequencies vary systematically in an essentially independent manner. This selection of 
markers achieves a systematic distribution of the markers over a two-dimensional region having the orthogonal dimensions of chromosomal 
location and least common allele frequency. By using the two characteristics or two dimensions of marker chromosomal location and 
marker allele population frequency in this unique way, the power and systematic nature of genetic linkage studies using association based 
linkage tests is greatly increased. These unique two^imensional linkage study techniques increase the power of association based linkage 
studies to localize trait causing genes or polymorphisms of modest effect such as human disease causing polymorphisms 
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Drawing #1: Computer Program Flowsheet for Process #1 and 
Process#1A 



step a) Choose markers to systematically cover a CL-F region 



Step b) Choose a statistical linkage test for 
each marker 



Step c) Choose a sample for each 
marker 



Step d) Obtain genotype data/sample allele frequency 
data for each marker and sample and obtain phenotype 
status for each individual in the samples 



Step e) Calculate evidence for linkage between each marker and the gene 



Step f) Identify those markers which 
show evidence for linkage 



Step f) Localize the gene to the CL-F 
region of markers that show evidence for 
linkage 
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